Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Bayes Academy Development Report 2 - improved data visualization

Kaj_Sotala 18 December 2014 10:11PM

See here for the previous update if you missed / forgot it.

In this update, no new game content, but new graphics.

I wasn’t terribly happy about the graphical representation of the various nodes in the last update. Especially in the first two networks, if you didn’t read the descriptions of the nodes carefully, it was very easy to just click your way through them without really having a clue of what the network was actually doing. Needless to say, for a game that’s supposed to teach how the networks function, this is highly non-optimal.

Here’s the representation that I’m now experimenting with: the truth table of the nodes is represented graphically inside the node. The prior variable at the top doesn’t really have a truth table, it’s just true or false. The “is” variable at the bottom is true if its parent is true, and false if its parent is false.

You may remember that in the previous update, unobservable nodes were represented in grayscale. I ended up dropping that, because that would have been confusing in this representation: if the parent is unobservable, should the blobs representing its truth values in the child node be in grayscale as well? Both “yes” and “no” answers felt confusing.

Instead the observational state of a node is now represented by its border color. Black for unobservable, gray for observable, no border for observed. The metaphor is supposed to be something like, a border is a veil of ignorance blocking us from seeing the node directly, but if the veil is gray it’s weak enough to be broken, whereas a black veil is strong enough to resist a direct assault. Or something.

When you observe a node, not only does its border disappear, but the truth table entries that get reduced to a zero probability disappear, to be replaced by white boxes. I experimented with having the eliminated entries still show up in grayscale, so you could e.g. see that the “is” node used to contain the entry for (false -> false), but felt that this looked clearer.

The “or” node at the bottom is getting a little crowded, but hopefully not too crowded. Since we know that its value is “true”, the truth table entry showing (false, false -> false) shows up in all whites. It’s also already been observed, so it starts without a border.

After we observe that there’s no monster behind us, the “or” node loses its entries for (monster, !waiting -> looks) and (monster, waiting -> looks), leaving only (!monster, waiting -> looks): meaning that the boy must be waiting for us to answer.

This could still be made clearer: currently the network updates instantly. I’m thinking about adding a brief animation where the “monster” variable would first be revealed as false, which would then propagate an update to the values of “looks at you” (with e.g. the red tile in “monster” blinking at the same time as the now-invalid truth table entries, and when the tiles stopped blinking, those now-invalid entries would have disappeared), and that would in turn propagate the update to the “waiting” node, deleting the red color from it. But I haven’t yet implemented this.

The third network is where things get a little tricky. The “attacking” node is of type “majority vote” - i.e. it’s true if at least two of its parents are true, and false otherwise. That would make for a truth table with eight entries, each holding four blobs each, and we could already see the “or” node in the previous screen being crowded. I’m not quite sure of what to do here. At this moment I’m thinking of just leaving the node as is, and displaying more detailed information in the sidebar.

Here’s another possible problem. Just having the truth table entries works fine to make it obvious where the overall probability of the node comes from… for as long as the valid values of the entries are restricted to “possible” and “impossible”. Then you can see at a glance that, say, of the three possible entries, two would make this node true and one would make this false, so there’s a ⅔ chance of it being true.

But in this screen, that has ceased to be the case. The “attacking” node has a 75% chance of being true, meaning that, for instance, the “is / block” node’s “true -> true” entry also has a 75% chance of being the right one. This isn’t reflected in the truth table visualization. I thought of adding small probability bars under each truth table entry, or having the size of the truth table blobs reflect their probability, but then I’d have to make the nodes even bigger, and it feels like it would easily start looking cluttered again. But maybe it’d be the right choice anyway? Or maybe just put the more detailed information in the sidebar? I’m not sure of the best thing to do here.

If anyone has good suggestions, I would be grateful to get advice from people who have more of a visual designer gene than I do!

[link] The Philosophy of Intelligence Explosions and Advanced Robotics

12 Kaj_Sotala 02 December 2014 03:44AM

The philosopher John Danaher has posted a list of all the posts that he's written on the topic of robotics and AI. Below is the current version of the list: he says that he will keep updating the page as he writes more.

  • The Singularity: Overview and Framework: This was my first attempt to provide a general overview and framework for understanding the debate about the technological singularity. I suggested that the debate could be organised around three main theses: (i) the explosion thesis -- which claims that there will be an intelligence explosion; (ii) the unfriendliness thesis -- which claims that an advanced artificial intelligence is likely to be "unfriendly"; and (iii) the inevitability thesis -- which claims that the creation of an unfriendly AI will be difficult to avoid, if not inevitable.
  • The Singularity: Overview and Framework Redux: This was my second attempt to provide a general overview and framework for understanding the debate about the technological singularity. I tried to reduce the framework down to two main theses: (i) the explosion thesis and (ii) the unfriendliness thesis.
  • AIs and the Decisive Advantage Thesis: Many people claim that an advanced artificial intelligence would have decisive advantages over human intelligences. Is this right? In this post, I look at Kaj Sotala's argument to that effect.
  • Is there a case for robot slaves? - If robots can be persons -- in the morally thick sense of "person" -- then surely it would be wrong to make them cater to our every whim? Or would it? Steve Petersen argues that the creation of robot slaves might be morally permissible. In this post, I look at what he has to say.
  • The Ethics of Robot Sex: A reasonably self-explanatory title. This post looks at the ethical issues that might arise from the creation of sex robots.
  • Will sex workers be replaced by robots? A Precis: A short summary of a longer article examining the possibility of sex workers being replaced by robots. Contrary to the work of others, I suggest that sex work might be resilient to the phenomenon of technological unemployment.
  • Bostrom on Superintelligence (2) The Instrumental Convergence Thesis: The second part in my series on Bostrom's book. This one examines the instrumental convergence thesis, according to which an intelligent agent, no matter what its final goals may be, is likely to converge upon certain instrumental goals that are unfriendly to human beings.

My experience of the recent CFAR workshop

29 Kaj_Sotala 27 November 2014 04:17PM

Originally posted at my blog.


I just got home from a four-day rationality workshop in England that was organized by the Center For Applied Rationality (CFAR). It covered a lot of content, but if I had to choose a single theme that united most of it, it was listening to your emotions.

That might sound like a weird focus for a rationality workshop, but cognitive science has shown that the intuitive and emotional part of the mind (”System 1”) is both in charge of most of our behavior, and also carries out a great deal of valuable information-processing of its own (it’s great at pattern-matching, for example). Much of the workshop material was aimed at helping people reach a greater harmony between their System 1 and their verbal, logical System 2. Many of people’s motivational troubles come from the goals of their two systems being somehow at odds with each other, and we were taught to have our two systems have a better dialogue with each other, harmonizing their desires and making it easier for information to cross from one system to the other and back.

To give a more concrete example, there was the technique of goal factoring. You take a behavior that you often do but aren’t sure why, or which you feel might be wasted time. Suppose that you spend a lot of time answering e-mails that aren’t actually very important. You start by asking yourself: what’s good about this activity, that makes me do it? Then you try to listen to your feelings in response to that question, and write down what you perceive. Maybe you conclude that it makes you feel productive, and it gives you a break from tasks that require more energy to do.

Next you look at the things that you came up with, and consider whether there’s a better way to accomplish them. There are two possible outcomes here. Either you conclude that the behavior is an important and valuable one after all, meaning that you can now be more motivated to do it. Alternatively, you find that there would be better ways of accomplishing all the goals that the behavior was aiming for. Maybe taking a walk would make for a better break, and answering more urgent e-mails would provide more value. If you were previously using two hours per day on the unimportant e-mails, possibly you could now achieve more in terms of both relaxation and actual productivity by spending an hour on a walk and an hour on the important e-mails.

At this point, you consider your new plan, and again ask yourself: does this feel right? Is this motivating? Are there any slight pangs of regret about giving up my old behavior? If you still don’t want to shift your behavior, chances are that you still have some motive for doing this thing that you have missed, and the feelings of productivity and relaxation aren’t quite enough to cover it. In that case, go back to the step of listing motives.

Or, if you feel happy and content about the new direction that you’ve chosen, victory!

Notice how this technique is all about moving information from one system to another. System 2 notices that you’re doing something but it isn’t sure why that is, so it asks System 1 for the reasons. System 1 answers, ”here’s what I’m trying to do for us, what do you think?” Then System 2 does what it’s best at, taking an analytic approach and possibly coming up with better ways of achieving the different motives. Then it gives that alternative approach back to System 1 and asks, would this work? Would this give us everything that we want? If System 1 says no, System 2 gets back to work, and the dialogue continues until both are happy.

Again, I emphasize the collaborative aspect between the two systems. They’re allies working for common goals, not enemies. Too many people tend towards one of two extremes: either thinking that their emotions are stupid and something to suppress, or completely disdaining the use of logical analysis. Both extremes miss out on the strengths of the system that is neglected, and make it unlikely for the person to get everything that they want.

As I was heading back from the workshop, I considered doing something that I noticed feeling uncomfortable about. Previous meditation experience had already made me more likely to just attend to the discomfort rather than trying to push it away, but inspired by the workshop, I went a bit further. I took the discomfort, considered what my System 1 might be trying to warn me about, and concluded that it might be better to err on the side of caution this time around. Finally – and this wasn’t a thing from the workshop, it was something I invited on the spot – I summoned a feeling of gratitude and thanked my System 1 for having been alert and giving me the information. That might have been a little overblown, since neither system should actually be sentient by itself, but it still felt like a good mindset to cultivate.

Although it was never mentioned in the workshop, what comes to mind is the concept of wu-wei from Chinese philosophy, a state of ”effortless doing” where all of your desires are perfectly aligned and everything comes naturally. In the ideal form, you never need to force yourself to do something you don’t want to do, or to expend willpower on an unpleasant task. Either you want to do something and do, or don’t want to do it, and don’t.

A large number of the workshop’s classes – goal factoring, aversion factoring and calibration, urge propagation, comfort zone expansion, inner simulation, making hard decisions, Hamming questions, againstness – were aimed at more or less this. Find out what System 1 wants, find out what System 2 wants, dialogue, aim for a harmonious state between the two. Then there were a smaller number of other classes that might be summarized as being about problem-solving in general.

The classes about the different techniques were interspersed with ”debugging sessions” of various kinds. In the beginning of the workshop, we listed different bugs in our lives – anything about our lives that we weren’t happy with, with the suggested example bugs being things like ”every time I talk to so-and-so I end up in an argument”, ”I think that I ‘should’ do something but don’t really want to”, and ”I’m working on my dissertation and everything is going fine – but when people ask me why I’m doing a PhD, I have a hard time remembering why I wanted to”. After we’d had a class or a few, we’d apply the techniques we’d learned to solving those bugs, either individually, in pairs, or small groups with a staff member or volunteer TA assisting us. Then a few more classes on techniques and more debugging, classes and debugging, and so on.

The debugging sessions were interesting. Often when you ask someone for help on something, they will answer with direct object-level suggestions – if your problem is that you’re underweight and you would like to gain some weight, try this or that. Here, the staff and TAs would eventually get to the object-level advice as well, but first they would ask – why don’t you want to be underweight? Okay, you say that you’re not completely sure but based on the other things that you said, here’s a stupid and quite certainly wrong theory of what your underlying reasons for it might be, how does that theory feel like? Okay, you said that it’s mostly on the right track, so now tell me what’s wrong with it? If you feel that gaining weight would make you more attractive, do you feel that this is the most effective way of achieving that?

Only after you and the facilitator had reached some kind of consensus of why you thought that something was a bug, and made sure that the problem you were discussing was actually the best way to address to reasons, would it be time for the more direct advice.

At first, I had felt that I didn’t have very many bugs to address, and that I had mostly gotten reasonable advice for them that I might try. But then the workshop continued, and there were more debugging sessions, and I had to keep coming up with bugs. And then, under the gentle poking of others, I started finding the underlying, deep-seated problems, and some things that had been motivating my actions for the last several months without me always fully realizing it. At the end, when I looked at my initial list of bugs that I’d come up with in the beginning, most of the first items on the list looked hopelessly shallow compared to the later ones.

Often in life you feel that your problems are silly, and that you are affected by small stupid things that ”shouldn’t” be a problem. There was none of that at the workshop: it was tacitly acknowledged that being unreasonably hindered by ”stupid” problems is just something that brains tend to do.  Valentine, one of the staff members, gave a powerful speech about ”alienated birthrights” – things that all human beings should be capable of engaging in and enjoying, but which have been taken from people because they have internalized beliefs and identities that say things like ”I cannot do that” or ”I am bad at that”. Things like singing, dancing, athletics, mathematics, romantic relationships, actually understanding the world, heroism, tackling challenging problems. To use his analogy, we might not be good at these things at first, and may have to grow into them and master them the way that a toddler grows to master her body. And like a toddler who’s taking her early steps, we may flail around and look silly when we first start doing them, but these are capacities that – barring any actual disabilities – are a part of our birthright as human beings, which anyone can ultimately learn to master.

Then there were the people, and the general atmosphere of the workshop. People were intelligent, open, and motivated to work on their problems, help each other, and grow as human beings. After a long, cognitively and emotionally exhausting day at the workshop, people would then shift to entertainment ranging from wrestling to telling funny stories of their lives to Magic: the Gathering. (The game of ”bunny” was an actual scheduled event on the official agenda.) And just plain talk with each other, in a supportive, non-judgemental atmosphere. It was the people and the atmosphere that made me the most reluctant to leave, and I miss them already.

Would I recommend CFAR’s workshops to others? Although my above description may sound rather gushingly positive, my answer still needs to be a qualified ”mmmaybe”. The full price tag is quite hefty, though financial aid is available and I personally got a very substantial scholarship, with the agreement that I would pay it at a later time when I could actually afford it.

Still, the biggest question is, will the changes from the workshop stick? I feel like I have gained a valuable new perspective on emotions, a number of useful techniques, made new friends, strengthened my belief that I can do the things that I really set my mind on, and refined the ways by which I think of the world and any problems that I might have – but aside for the new friends, all of that will be worthless if it fades away in a week. If it does, I would have to judge even my steeply discounted price as ”not worth it”. That said, the workshops do have a money-back guarantee if you’re unhappy with the results, so if it really feels like it wasn’t worth it, I can simply choose to not pay. And if all the new things do end up sticking, it might still turn out that it would have been worth paying even the full, non-discounted price.

CFAR does have a few ways by which they try to make the things stick. There will be Skype follow-ups with their staff, for talking about how things have been going since the workshop. There is a mailing list for workshop alumni, and the occasional events, though the physical events are very US-centric (and in particular, San Francisco Bay Area-centric).

The techniques that we were taught are still all more or less experimental, and are being constantly refined and revised according to people’s experiences. I have already been thinking of a new skill that I had been playing with for a while before the workshop, and which has a bit of that ”CFAR feel” – I will aim to have it written up soon and sent to the others, and maybe it will eventually make its way to the curriculum of a future workshop. That should help keep me engaged as well.

We shall see. Until then, as they say in CFAR – to victory!

Bayes Academy: Development report 1

45 Kaj_Sotala 19 November 2014 10:35PM

Some of you may remember me proposing a game idea that went by the name of The Fundamental Question. Some of you may also remember me talking a lot about developing an educational game about Bayesian Networks for my MSc thesis, but not actually showing you much in the way of results.

Insert the usual excuses here. But thanks to SSRIs and mytomatoes.com and all kinds of other stuff, I'm now finally on track towards actually accomplishing something. Here's a report on a very early prototype.

This game has basically two goals: to teach its players something about Bayesian networks and probabilistic reasoning, and to be fun. (And third, to let me graduate by giving me material for my Master's thesis.)

We start with the main character stating that she is nervous. Hitting any key, the player proceeds through a number of lines of internal monologue:

I am nervous.

I’m standing at the gates of the Academy, the school where my brother Opin was studying when he disappeared. When we asked the school to investigate, they were oddly reluctant, and told us to drop the issue.

The police were more helpful at first, until they got in contact with the school. Then they actually started threatening us, and told us that we would get thrown in prison if we didn’t forget about Opin.

That was three years ago. Ever since it happened, I’ve been studying hard to make sure that I could join the Academy once I was old enough, to find out what exactly happened to Opin. The answer lies somewhere inside the Academy gates, I’m sure of it.

Now I’m finally 16, and facing the Academy entrance exams. I have to do everything I can to pass them, and I have to keep my relation to Opin a secret, too. 

???: “Hey there.”

Eep! Someone is talking to me! Is he another applicant, or a staff member? Wait, let me think… I’m guessing that applicant would look a lot younger than staff members! So, to find that out… I should look at him!

[You are trying to figure out whether the voice you heard is a staff member or another applicant. While you can't directly observe his staff-nature, you believe that he'll look young if he's an applicant, and like an adult if he's a staff member. You can look at him, and therefore reveal his staff-nature, by right-clicking on the node representing his apperance.]

Here is our very first Bayesian Network! Well, it's not really much of a network: I'm starting with the simplest possible case in order to provide an easy start for the player. We have one node that cannot be observed ("Student", its hidden nature represented by showing it in greyscale), and an observable node ("Young-looking") whose truth value is equal to that of the Student node. All nodes are binary random variables, either true or false. 

According to our current model of the world, "Student" has a 50% chance of being true, so it's half-colored in white (representing the probability of it being true) and half-colored in black (representing the probability of it being false). "Young-looking" inherits its probability directly. The player can get a bit of information about the two nodes by left-clicking on them.

The game also offers alternate color schemes for colorblind people who may have difficulties distinguishing red and green.

Now we want to examine the person who spoke to us. Let's look at him, by right-clicking on the "Young-looking" node.

Not too many options here, because we're just getting started. Let's click on "Look at him", and find out that he is indeed young, and thus a student.

This was the simplest type of minigame offered within the game. You are given a set of hidden nodes whose values you're tasked with discovering by choosing which observable nodes to observe. Here the player had no way to fail, but later on, the minigames will involve a time limit and too many observable nodes to inspect within that time limit. It then becomes crucial to understand how probability flows within a Bayesian network, and which nodes will actually let you know the values of the hidden nodes.

The story continues!

Short for an adult, face has boyish look, teenagerish clothes... yeah, he looks young!

He's a student!

...I feel like I’m overthinking things now.

...he’s looking at me.

I’m guessing he’s either waiting for me to respond, or there’s something to see behind me, and he’s actually looking past me. If there isn’t anything behind me, then I know that he must be waiting for me to respond.

Maybe there's a monster behind me, and he's paralyzed with fear! I should check that possibility before it eats me!

[You want to find out whether the boy is waiting for your reply or staring at a monster behind you. You know that he's looking at you, and your model of the world suggests that he will only look in your direction if he's waiting for you to reply, or if there's a monster behind you. So if there's no monster behind you, you know that he's waiting for you to reply!]

Slightly more complicated network, but still, there's only one option here. Oops, apparently the "Looks at you" node says it's an observable variable that you can right-click to observe, despite the fact that it's already been observed. I need to fix that.

Anyway, right-clicking on "Attacking monster" brings up a "Look behind you" option, which we'll choose.

You see nothing there. Besides trees, that is.

Boy: “Um, are you okay?”

“Yeah, sorry. I just… you were looking in my direction, and I wasn’t sure of whether you were expecting me to reply, or whether there was a monster behind me.”

He blinks.

Boy: “You thought that there was a reasonable chance for a monster to be behind you?”

I’m embarrassed to admit it, but I’m not really sure of what the probability of a monster having snuck up behind me really should have been.

My studies have entirely focused on getting into this school, and Monsterology isn’t one of the subjects on the entrance exam!

I just went with a 50-50 chance since I didn’t know any better.

'Okay, look. Monsterology is my favorite subject. Monsters avoid the Academy, since it’s surrounded by a mystical protective field. There’s no chance of them getting even near! 0 percent chance.'

'Oh. Okay.'

[Your model of the world has been updated! The prior of the variable 'Monster Near The Academy' is now 0%.]

Then stuff happens and they go stand in line for the entrance exam or something. I haven't written this part. Anyway, then things get more exciting, for a wild monster appears!

Stuff happens


Huh, the monster is carrying a sword.

Well, I may not have studied Monsterology, but I sure did study fencing!

[You draw your sword. Seeing this, the monster rushes at you.]

He looks like he's going to strike. But is it really a strike, or is it a feint?

If it's a strike, I want to block and counter-attack. But if it's a feint, that leaves him vulnerable to my attack.

I have to choose wisely. If I make the wrong choice, I may be dead.

What did my master say? If the opponent has at least two of dancing legs, an accelerating midbody, and ferocious eyes, then it's an attack!

Otherwise it's a feint! Quick, I need to read his body language before it's too late!

Now get to the second type of minigame! Here, you again need to discover the values of some number of hidden variables within a time limit, but here it is in order to find out the consequences of your decision. In this one, the consequence is simple - either you live or you die. I'll let the screenshot and tutorial text speak for themselves:

[Now for some actual decision-making! The node in the middle represents the monster's intention to attack (or to feint, if it's false). Again, you cannot directly observe his intention, but on the top row, there are things about his body language that signal his intention. If at least two of them are true, then he intends to attack.]

[Your possible actions are on the bottom row. If he intends to attack, then you want to block, and if he intends to feint, you want to attack. You need to inspect his body language and then choose an action based on his intentions. But hurry up! Your third decision must be an action, or he'll slice you in two!]

In reality, the top three variables are not really independent of each other. We want to make sure that the player can always win this battle despite only having three actions. That's two actions for inspecting variables, and one action for actually making a decision. So this battle is rigged: either the top three variables are all true, or they're all false.

...actually, now that I think of it, the order of the variables is wrong. Logically, the body language should be caused by the intention to attack, and not vice versa, so the arrows should point from the intention to body language. I'll need to change that. I got these mixed up because the prototypical exemplar of a decision minigame is one where you need to predict someone's reaction from their personality traits, and there the personality traits do cause the reaction. Anyway, I want to get this post written before I go to bed, so I won't change that now.

Right-clicking "Dancing legs", we now see two options besides "Never mind"!

We can find out the dancingness of the enemy's legs by thinking about our own legs - we are well-trained, so our legs are instinctively mirroring our opponent's actions to prevent them from getting an advantage over us - or by just instinctively feeling where they are, without the need to think about them! Feeling them would allow us to observe this node without spending an action.

Unfortunately, feeling them has "Fencing 2" as a prerequisite skill, and we don't have that. Neither could we have them, in this point of the game. The option is just there to let the player know that there are skills to be gained in this game, and make them look forward to the moment when they can actually gain that skill. As well as giving them an idea of how the skill can be used.

Anyway, we take a moment to think of our legs, and even though our opponent gets closer to us in that time, we realize that our legs our dancing! So his legs must be dancing as well!

With our insider knowledge, we now know that he's attacking, and we could pick "Block" right away. But let's play this through. The network has automatically recalculated the probabilities to reflect our increased knowledge, and is now predicting a 75% chance for our enemy to be attacking, and for "Blocking" to thus be the right decision to make.

Next we decide to find out what his eyes say, by matching our gaze with his. Again, there would be a special option that cost us no time - this time around, one enabled by Empathy 1 - but we again don't have that option.

Except that his gaze is so ferocious that we are forced to look away! While we are momentarily distracted, he closes the distance, ready to make his move. But now we know what to do... block!


Now the only thing that remains to do is to ask our new-found friend for an explanation.

"You told me there was a 0% chance of a monster near the academy!"

Boy: “Ehh… yeah. I guess I misremembered. I only read like half of our course book anyway, it was really boring.”

“Didn’t you say that Monsterology was your favorite subject?”

Boy: “Hey, that only means that all the other subjects were even more boring!”

“. . .”

I guess I shouldn’t put too much faith on what he says.

[Your model of the world has been updated! The prior of the variable 'Monster Near The Academy' is now 50%.]

[Your model of the world has been updated! You have a new conditional probability variable: 'True Given That The Boy Says It's True', 25%]

And that's all for now. Now that the basic building blocks are in place, future progress ought to be much faster.


As you might have noticed, my "graphics" suck. A few of my friends have promised to draw art, but besides that, the whole generic Java look could go. This is where I was originally planning to put in the sentence "and if you're a Java graphics whiz and want to help fix that, the current source code is conveniently available at GitHub", but then getting things to his point took longer than I expected and I didn't have the time to actually figure out how the whole Eclipse-GitHub integration works. I'll get to that soon. Github link here!

I also want to make the nodes more informative - right now they only show their marginal probability. Ideally, clicking on them would expand them to a representation where you could visually see what components their probability composed of. I've got some scribbled sketches of what this should look like for various node types, but none of that is implemented yet.

I expect some of you to also note that the actual Bayes theorem hasn't shown up yet, at least in no form resembling the classic mammography problem. (It is used implicitly in the network belief updates, though.) That's intentional - there will be a third minigame involving that form of the theorem, but somehow it felt more natural to start this way, to give the player a rough feeling of how probability flows through Bayesian networks. Admittedly I'm not sure of how well that's happening so far, but hopefully more minigames should help the player figure it out better.

What's next? Once the main character (who needs a name) manages to get into the Academy, there will be a lot of social scheming, and many mysteries to solve in order for her to find out just what did happen to her brother... also, I don't mind people suggesting things, such as what could happen next, and what kinds of network configurations the character might face in different minigames.

(Also, everything that you've seen might get thrown out and rewritten if I decide it's no good. Let me know what you think of the stuff so far!)

My new paper: Concept learning for safe autonomous AI

18 Kaj_Sotala 15 November 2014 07:17AM

Abstract: Sophisticated autonomous AI may need to base its behavior on fuzzy concepts that cannot be rigorously defined, such as well-being or rights. Obtaining desired AI behavior requires a way to accurately specify these concepts. We review some evidence suggesting that the human brain generates its concepts using a relatively limited set of rules and mechanisms. This suggests that it might be feasible to build AI systems that use similar criteria and mechanisms for generating their own concepts, and could thus learn similar concepts as humans do. We discuss this possibility, and also consider possible complications arising from the embodied nature of human thought, possible evolutionary vestiges in cognition, the social nature of concepts, and the need to compare conceptual representations between humans and AI systems.

I just got word that this paper was accepted for the AAAI-15 Workshop on AI and Ethics: I've uploaded a preprint here. I'm hoping that this could help seed a possibly valuable new subfield of FAI research. Thanks to Steve Rayhawk for invaluable assistance while I was writing this paper: it probably wouldn't have gotten done without his feedback motivating me to work on this.

Comments welcome. 

[meta] New LW moderator: Viliam_Bur

39 Kaj_Sotala 13 September 2014 01:37PM

Some time back, I wrote that I was unwilling to continue with investigations into mass downvoting, and asked people for suggestions on how to deal with them from now on. The top-voted proposal in that thread suggested making Viliam_Bur into a moderator, and Viliam gracefully accepted the nomination. So I have given him moderator privileges and also put him in contact with jackk, who provided me with the information necessary to deal with the previous cases. Future requests about mass downvote investigations should be directed to Viliam.

Thanks a lot for agreeing to take up this responsibility, Viliam! It's not an easy one, but I'm very grateful that you're willing to do it. Please post a comment here so that we can reward you with some extra upvotes. :)

I'm holding a birthday fundraiser

23 Kaj_Sotala 05 September 2014 12:38PM

EDIT: The fundraiser was successfully completed, raising the full $500 for worthwhile charities. Yay!

Today's my birthday! And per Peter Hurford's suggestion, I'm holding a birthday fundraiser to help raise money for MIRI, GiveDirectly, and Mercy for Animals. If you like my activity on LW or elsewhere, please consider giving a few dollars to one of these organizations via the fundraiser page. You can specify which organization you wish to donate in the comment of the donation, or just leave it unspecified, in which case I'll give your donation to MIRI.

If you don't happen to be particularly altruistically motivated, just consider it a birthday gift to me - it will give me warm fuzzies to know that I helped move money for worthy organizations. And if you are altruistically motivated but don't care about me in particular, maybe you still can get yourself to donate more than usual by hacky stuff like someone you know on the Internet having a birthday. :)

If someone else wants to hold their own birthday fundraiser, here are some tips: birthday fundraisers.

[meta] Future moderation and investigation of downvote abuse cases, or, I don't want to deal with this stuff

45 Kaj_Sotala 17 August 2014 02:40PM

Since the episode with Eugine_Nier, I have received three private messages from different people asking me to investigate various cases of suspected mass downvoting. And to be quite honest, I don't want to deal with this. Eugine's case was relatively clear-cut, since he had engaged in systematic downvoting of a massive scale, but the new situations are a lot fuzzier and I'm not sure of what exactly the rules should be (what counts as a permitted use of the downvote system and what doesn't?).

At least one person has also privately contacted me and offered to carry out moderator duties if I don't want them, but even if I told them yes (on what basis? why them and not someone else?), I don't know what kind of policy I should tell them to enforce. I only happened to be appointed a moderator because I was in the list of top 10 posters at a particular time, and I don't feel like I should have any particular authority to make the rules. Nor do I feel like I have any good idea of what the rules should be, or who would be the right person to enforce them.

In any case, I don't want to be doing this job, nor do I particularly feel like being responsible for figuring out who should, or how, or what the heck. I've already started visiting LW less often because I dread having new investigation requests to deal with. So if you folks could be so kind as to figure it out without my involvement? If there's a clear consensus that someone in particular should deal with this, I can give them mod powers, or something.

[moderator action] Eugine_Nier is now banned for mass downvote harassment

100 Kaj_Sotala 03 July 2014 12:04PM

As previously discussed, on June 6th I received a message from jackk, a Trike Admin. He reported that the user Jiro had asked Trike to carry out an investigation to the retributive downvoting that Jiro had been subjected to. The investigation revealed that the user Eugine_Nier had downvoted over half of Jiro's comments, amounting to hundreds of downvotes.

I asked the community's guidance on dealing with the issue, and while the matter was being discussed, I also reviewed previous discussions about mass downvoting and looked for other people who mentioned being the victims of it. I asked Jack to compile reports on several other users who mentioned having been mass-downvoted, and it turned out that Eugine was also overwhelmingly the biggest downvoter of users David_Gerard, daenarys, falenas108, ialdabaoth, shminux, and Tenoke. As this discussion was going on, it turned out that user Ander had also been targeted by Eugine.

I sent two messages to Eugine, requesting an explanation. I received a response today. Eugine admitted his guilt, expressing the opinion that LW's karma system was failing to carry out its purpose of keeping out weak material and that he was engaged in a "weeding" of users who he did not think displayed sufficient rationality.

Needless to say, it is not the place of individual users to unilaterally decide that someone else should be "weeded" out of the community. The Less Wrong content deletion policy contains this clause:

Harrassment of individual users.

If we determine that you're e.g. following a particular user around and leaving insulting comments to them, we reserve the right to delete those comments. (This has happened extremely rarely.)

Although the wording does not explicitly mention downvoting, harassment by downvoting is still harassment. Several users have indicated that they have experienced considerable emotional anguish from the harassment, and have in some cases been discouraged from using Less Wrong at all. This is not a desirable state of affairs, to say the least.

I was originally given my moderator powers on a rather ad-hoc basis, with someone awarding mod privileges to the ten users with the highest karma at the time. The original purpose for that appointment was just to delete spam. Nonetheless, since retributive downvoting has been a clear problem for the community, I asked the community for guidance on dealing with the issue. The rough consensus of the responses seemed to authorize me to deal with the problem as I deemed appropriate.

The fact that Eugine remained quiet about his guilt until directly confronted with the evidence, despite several public discussions of the issue, is indicative of him realizing that he was breaking prevailing social norms. Eugine's actions have worsened the atmosphere of this site, and that atmosphere will remain troubled for as long as he is allowed to remain here.

Therefore, I now announce that Eugine_Nier is permanently banned from posting on LessWrong. This decision is final and will not be changed in response to possible follow-up objections.

Unfortunately, it looks like while a ban prevents posting, it does not actually block a user from casting votes. I have asked jackk to look into the matter and find a way to actually stop the downvoting. Jack indicated earlier on that it would be technically straightforward to apply a negative karma modifier to Eugine's account, and wiping out Eugine's karma balance would prevent him from casting future downvotes. Whatever the easiest solution is, it will be applied as soon as possible.

EDIT 24 July 2014: Banned users are now prohibited from voting.

[meta] Policy for dealing with users suspected/guilty of mass-downvote harassment?

28 Kaj_Sotala 06 June 2014 05:46AM

Below is a message I just got from jackk. Some specifics have been redacted 1) so that we can discuss general policy rather than the details of this specific case 2) because presumption of innocence, just in case there happens to be an innocuous explanation to this.

Hi Kaj_Sotala,

I'm Jack, one of the Trike devs. I'm messaging you because you're the moderator who commented most recently. A while back the user [REDACTED 1] asked if Trike could look into retributive downvoting against his account. I've done that, and it looks like [REDACTED 2] has downvoted at least [over half of REDACTED 1's comments, amounting to hundreds of downvotes] ([REDACTED 1]'s next-largest downvoter is [REDACTED 3] at -15).

What action to take is a community problem, not a technical one, so we'd rather leave that up to the moderators. Some options:

1. Ask [REDACTED 2] for the story behind these votes
2. Use the "admin" account (which exists for sending scripted messages, &c.) to apply an upvote to each downvoted post
3. Apply a karma award to [REDACTED 1]'s account. This would fix the karma damage but not the sorting of individual comments
4. Apply a negative karma award to [REDACTED 2]'s account. This makes him pay for false downvotes twice over. This isn't possible in the current code, but it's an easy fix
5. Ban [REDACTED 2]

For future reference, it's very easy for Trike to look at who downvoted someone's account, so if you get questions about downvoting in the future I can run the same report.

If you need to verify my identity before you take action, let me know and we'll work something out.

-- Jack

So... thoughts? I have mod powers, but when I was granted them I was basically just told to use them to fight spam; there was never any discussion of any other policy, and I don't feel like I have the authority to decide on the suitable course of action without consulting the rest of the community.

View more: Next