New(ish) AI control ideas

5 Stuart_Armstrong 05 March 2015 05:03PM

I recently went on a two day intense solitary "AI control retreat", with the aim of generating new ideas for making safe AI. The "retreat" format wasn't really a success ("focused uninterrupted thought" was the main gain, not "two days of solitude" - it would have been more effective in three hour sessions), but I did manage to generate a lot of new ideas. These ideas will now go before the baying bloodthirsty audience (that's you, folks) to test them for viability.

To provide inspiration and direction to my thought process, I first listed all the easy responses that we generally give to most proposals for AI control. If someone comes up with a new/old brilliant idea for AI control, it can normally be dismissed by appealing to one of these responses:

  1. The AI is much smarter than us.
  2. It’s not well defined.
  3. The setup can be hacked.
    • By the agent.
    • By outsiders, including other AI.
    • Adding restrictions encourages the AI to hack them, not obey them.
  4. The agent will resist changes.
  5. Humans can be manipulated, hacked, or seduced.
  6. The design is not stable.
    • Under self-modification.
    • Under subagent creation.
  7. Unrestricted search is dangerous.
  8. The agent has, or will develop, dangerous goals.
HPMOR Wrap Parties: Resources, Information and Discussion

9 Habryka 04 March 2015 07:49PM

Harry Potter and the Methods of Rationality - Wrap Party Summary Thread

As many of you probably read on the HPMOR author's note last month, I am the coordinator of the HPMOR Wrap parties. Many of you have reached out to me, I put hundreds of you into contact with each other, and over 20 parties on 4 continents are now going to happen. Now it is time to get as much attendance to the events as possible, make sure that we all get the most out of the events and use the momentum that HPMOR has brought this community. This post will serve as a central location for all information and resources available for the parties, as well as a place for discussion in the comments. 


I set up a few different systems to coordinate everyone, and make it easier for everyone interested in the wrap parties to connect. Here they are: 

The Map:

This map can help you get a quick overview of how many people in your area are strongly interested, and who might help you with organizing an event. Remember that not even half of the people currently RSVP'd for Facebook events have added themselves to the map, so this map is the absolute minimum level of engagement in your area. I will be adding all events to the map as they are posted in the Facebook group. Please add yourself to the map if you can! (But please be careful to not destroy the pins of anyone else, to use the correct pin type, and to not create any empty pins.)

The Facebook Group:

This is the main location for discussion of the wrap parties and also the location at which all of the events are conveniently collected. You can find all events under the "Events" tab, and if you add your own event in this group you can conveniently invite everyone who has added themselves to this group. I would still additionally advice you to invite all of your friends who might be interested, since they might not have joined the group. 

The Organizer Mailing List: 

This mailing list is the fastest way for me to reach all of the organizers at the same time, and also the fastest way for all organizers to be kept up to speed with the newest resources available. Use this mailing list to discuss ideas and get help from other organizers. 


To help you quickly get a sense of whether there is a party happening in your area, here is a list of all the parties that I have so far learned about, with links to their respective Facebook events. Currently everything is on Facebook because that is much easier to coordinate, but I will try to add contact information for organizers for all of these parties very soon, so that people without Facebook can easily find the information that they need:

Parties in the U.S.:

  1. Berkeley, California
  2. Mountain View, California
  3. Phoenix, Arizona
  4. Washington DC
  5. Portland, Oregon
  6. New Orleans, Louisiana
  7. Sarasota, Florida
  8. Denver, Colorado
  9. Lawrence, Kansas
  10. Seattle, Washington
  11. MIT, Massachusetts [14th of March]
  12. Cambridge, Massachusetts [15th of March]
  13. New York

Parties outside of the U.S.: 

Please comment on this thread, send me a message, or add an event to the Facebook page (and invite me) to add your own party to this list!



To help everyone get their party started, Brayden McLean compiled a wonderful handbook for party organizers: 


Free Books:

We are providing free copies of the first 17 chapters of HPMOR to all parties in the U.S.! Just fill out this form today or tomorrow, and we will try to send you as many copies as you think you will need to hook all of your friends. 


More resources are soon to come, and I will keep this post updated with everything that is sent to me. 


Neutral hours: a tool for valuing time

6 owencb 04 March 2015 04:38PM

Prioritisation is mostly about working out how to trade different resources off against one another. Prioritisation problems come at different scales: for individuals, for companies or organisations, for the world at large. At the Global Priorities Project we’re mostly interested in the large-scale questions. But we sometimes have something to say about smaller scale problems, too.

I’ve just tidied and released old research notes (mostly from 2013) on the personal prioritisation problem of how to value time spent on different activities. This is primarily of use for individuals making decisions about how to spend their time, money, and mental energy.

Abstract: We get lots of opportunities to convert between time and money, and it’s hard to know which ones to take, since they use up other mental resources. I introduce the neutral hour as a tool for thinking about how to make these comparisons. A neutral hour is an hour spent where your mental energy is the same level at the start and the end. I work through some examples of how to use this tool, look at implications for some common scenarios, and explore the theory behind them.

There may be benefits for broader prioritisation questions. Since societies are comprised of individuals, it could help to know how to value time savings or costs to individuals when performing cost-benefit analysis on larger projects. And there may be techniques for comparing between different resources that we could usefully apply in wider contexts. However we think these benefits are secondary. We’re releasing this work now to let others take advantage of it: either for personal benefit; or to build on it and release easier-to-use guidance or tools.

You can find the full document here. I'm happy to answer questions and I'd love to know if people have thoughts on this material.

Meetup : Effective Altruism Meetup Switzerland

1 Serendipity 04 March 2015 01:54PM

WHEN: 21 March 2015 06:00:00PM (+0100)

WHERE: Efringerstrasse 25, 4057 Basel

The Swiss Effective Altruism Movement (EACH) is holding an EA meetup in Basel. This is an event for people who are interested in Effective Altruism to meet up and get to know each other. There will be no formal program but a lot of talking, eating and having fun!

Facebook event: https://www.facebook.com/events/1545987715672064/

Meetup : Washington, D.C.: Fun & Games (with Nomic)

1 RobinZ 04 March 2015 01:53AM

WHEN: 08 March 2015 02:00:00PM (-0500)

WHERE: Reynolds Center

Reminder: Daylight Saving Time begins on Sunday, March 8 at 2 a.m. - please bear this in mind when planning your time of arrival.

We will be meeting in the Kogod Courtyard of the Donald W. Reynolds Center for American Art and Portraiture (8th and F Sts or 8th and G Sts NW, go straight past the information desk from either entrance) to hang out, play games, and engage in fun conversation.

Note: Feel free to send an email to the Google Group - lesswrong-dc - if you have a game you wish to recruit people to play in advance. This is especially useful for games that take 3+ hours, as you would want to start playing by 3:30 to be sure of finishing by 7:00.

Note Also: As there has been some interest in this, we are going to have at least one game of Nomic. Players will probably work out their initial ruleset or rulesets somewhere in the 3:00-4:00 timespan to start playing by 4:00.

Note the Third: Harry Potter and the Methods of Rationality is likely to conclude on March 14th. David Steinberg is working on scheduling an end-of-HPMoR party for the DC area via Facebook; please contact him - david.isaac.steinberg@gmail.com - or myself - robin.zimm@gmail.com - if you would (a) prefer that Mini Talks be postponed in favor of end-of-HPMoR or (b) prefer that this not happen.

Greenbelt Station will be closed, with buses replacing trains; trains on many lines will be operating every 15 minutes to accommodate track work. The Verizon Center calendar for March 8 has a Charlie Wilson concert starting at 7 p.m. (doors 6 p.m.).

Upcoming meetups:

  • Mar. 15: Mini Talks OR HPMoR Discussion (short lectures by attendees OR freeform conversation about the story)
  • Mar. 22: Rationalist Taboo OR Mini Talks (discuss concepts without using the conventional words for them OR short lectures by attendees)
  • Mar. 29: Singing Meetup (weather permitting; Fun & Games will be fallback topic)

Keep Your Identity Fluid [LINK]

7 Peter_McIntyre 03 March 2015 03:10AM

Building on Graham's Small Identity, here I look at the hazards of identity, and give a suggestion for leveraging it to your advantage, as well as avoiding pitfalls. 

As per my last article, feel free to let me know what you think here, privately, or anonymously



Rationality Quotes Thread March 2015

4 Vaniver 02 March 2015 11:38PM

Another month, another rationality quotes thread. The rules are:

  • Please post all quotes separately, so that they can be upvoted or downvoted separately. (If they are strongly related, reply to your own comments. If strongly ordered, then go ahead and post them together.)
  • Do not quote yourself.
  • Do not quote from Less Wrong itself, HPMoR, Eliezer Yudkowsky, or Robin Hanson. If you'd like to revive an old quote from one of those sources, please do so here.
  • No more than 5 quotes per person per monthly thread, please.
  • Provide sufficient information (URL, title, date, page number, etc.) to enable a reader to find the place where you read the quote, or its original source if available. Do not quote with only a name.

Announcing the Complice Less Wrong Study Hall

44 malcolmocean 02 March 2015 11:37PM

(If you're familiar with the backstory of the LWSH, you can skip to paragraph 5. If you just want the link to the chat, click here: LWSH on Complice)

The Less Wrong Study Hall was created as a tinychat room in March 2013, following Mqrius and ShannonFriedman's desire to create a virtual context for productivity. In retrospect, I think it's hilarious that a bunch of the comments ended up being a discussion of whether LW had the numbers to get a room that consistently had someone in it. The funny part is that they were based around the assumption that people would spend about 1h/day in it.

Once it was created, it was so effective that people started spending their entire day doing pomodoros (with 32minsWork+8minsBreak) in the LWSH and now often even stay logged in while doing chores away from their computers, just for cadence of focus and the sense of company. So there's almost always someone there, and often 5-10 people.

A week in, a call was put out for volunteers to program a replacement for the much-maligned tinychat. As it turns out though, video chat is a hard problem.

So nearly 2 years later, people are still using the tinychat.

But a few weeks ago, I discovered that you can embed the tinychat applet into an arbitrary page. I immediately set out to integrate LWSH into Complice, the productivity app I've been building for over a year, which counts many rationalists among its alpha & beta users.

The focal point of Complice is its today page, which consists of a list of everything you're planning to accomplish that day, colorized by goal. Plus a pomodoro timer. My habit for a long time has been to have this open next to LWSH. So what I basically did was integrate these two pages. On the left, you have a list of your own tasks. On the right, a list of other users in the room, with whatever task they're doing next. Then below all of that, the chatroom.

(Something important to note: I'm not planning to point existing Complice users, who may not be LWers, at the LW Study Hall. Any Complice user can create their own coworking room by going to complice.co/createroom)

With this integration, I've solved a couple of the core problems that people wanted addressed for the study hall:

  • an actual ding sound beyond people typing in the chat
  • synchronized pomodoro time visibility
  • pomos that automatically start, so breaks don't run over
  • Intentions — what am I working on this pomo?
  • a list of what other users are working on
  • the ability to show off how many pomos you've done
  • better welcoming & explanation of group norms

There are a couple other requested features that I can definitely solve but decided could come after this launch:

  • rooms with different pomodoro durations
  • member profiles
  • the ability to precommit to showing up at a certain time (just wait'll I connect with Beeminder ;) )

The following points were brought up in the Programming the LW Study Hall post or on the List of desired features on the github/nnmm/lwsh wiki, but can't be fixed without replacing tinychat:

  • efficient with respect to bandwidth and CPU
  • page layout with videos lined up down the left for use on the side of monitors
  • chat history
  • encryption
  • everything else that generally sucks about tinychat

It's also worth noting that if you were to think of the entirety of Complice as an addition to LWSH... well, it would definitely look like feature creep, but at any rate there would be several other notable improvements:

  • daily emails prompting you to decide what you're going to do that day
  • a historical record of what you've done, with guided weekly, monthly, and yearly reviews
  • optional accountability partner who gets emails with what you've done every day (the LWSH might be a great place to find partners!)
So, if you haven't clicked the link already, check out: complice.co/room/lesswrong

(This article posted to Main because that's where the rest of the LWSH posts are, and this represents a substantial update.)

Meetup : HPMOR/Other media discussion

1 rocurley 02 March 2015 05:35AM

WHEN: 02 March 2015 06:15:00PM (-0800)

WHERE: 1061 Market St #4, San Francisco, CA 94103

This is the first meetup of the month and is therefore the Schelling meetup: If you want to come once a month, now's the time. We'll be meeting to talk about methods (in particular, the recent puzzle). Probably not everyone's read methods, so we'll be talking about other media as well. As always for discussion topic meetups, this is just the schelling topic, and you're free to talk about whatever you want.

Meetup : App Academy Discussion

1 Danny_Hintze 02 March 2015 04:52AM

WHEN: 08 March 2015 02:00:00PM (-0700)

WHERE: 300 East Orange Mall, Tempe, AZ 85281

Jay is going to fill us in on his recent App Academy experiences.

Text or call 602 501 9420. We usually meet up in one of the meeting rooms in the sub-basement (yes, there is a basement to the basement) of Hayden library, so it can be hard to find the first time you try and meet up with us.

