[LINK] Concrete problems in AI safety
From the Google Research blog:
We believe that AI technologies are likely to be overwhelmingly useful and beneficial for humanity. But part of being a responsible steward of any new technology is thinking through potential challenges and how best to address any associated risks. So today we’re publishing a technical paper, Concrete Problems in AI Safety, a collaboration among scientists at Google, OpenAI, Stanford and Berkeley.
While possible AI safety risks have received a lot of public attention, most previous discussion has been very hypothetical and speculative. We believe it’s essential to ground concerns in real machine learning research, and to start developing practical approaches for engineering AI systems that operate safely and reliably.
We’ve outlined five problems we think will be very important as we apply AI in more general circumstances. These are all forward thinking, long-term research questions -- minor issues today, but important to address for future systems:
- Avoiding Negative Side Effects: How can we ensure that an AI system will not disturb its environment in negative ways while pursuing its goals, e.g. a cleaning robot knocking over a vase because it can clean faster by doing so?
- Avoiding Reward Hacking: How can we avoid gaming of the reward function? For example, we don’t want this cleaning robot simply covering over messes with materials it can’t see through.
- Scalable Oversight: How can we efficiently ensure that a given AI system respects aspects of the objective that are too expensive to be frequently evaluated during training? For example, if an AI system gets human feedback as it performs a task, it needs to use that feedback efficiently because asking too often would be annoying.
- Safe Exploration: How do we ensure that an AI system doesn’t make exploratory moves with very negative repercussions? For example, maybe a cleaning robot should experiment with mopping strategies, but clearly it shouldn’t try putting a wet mop in an electrical outlet.
- Robustness to Distributional Shift: How do we ensure that an AI system recognizes, and behaves robustly, when it’s in an environment very different from its training environment? For example, heuristics learned for a factory workfloor may not be safe enough for an office.
We go into more technical detail in the paper. The machine learning research community has already thought quite a bit about most of these problems and many related issues, but we think there’s a lot more work to be done.
We believe in rigorous, open, cross-institution work on how to build machine learning systems that work as intended. We’re eager to continue our collaborations with other research groups to make positive progress on AI.
Rudeness
Really new to this site, I'm hitting a problem I've experienced in other aspects of my life as a student, employee and comedic performer: I'm extremely rude. I don't realize it at the time, thinking that I'm just being blunt, forceful and direct. In the sense that those can all be definitions of similar concepts, then yeah! Well, confidence in myself is a great asset, and I've turned it to positive effect, especially when I need to intimidate someone with a roguish smile and a calm, iron-hard assertion backed up by a blistering intensity (I sound like a Marty-Stu right now. Draco will want his leather pants back.)
But making rationally sound argument should not be about winning. It should be about accuracy, clarity and sanity. If you disagree with me on something I need to remind myself not to automatically fight. What good is being alpha when I'm ignoring my confusion and avoiding my embarrassment at possibly being mistaken. Not to mention that this is an internet forum, so limitations of the medium means attempting to look like a tough guy winds up hollow and sad, like a chocolate Easter bunny that's gone off.
Okay, I'm not one for similes, but I am one for trying to make myself more sane.
Strategic Bestseller: Taking the Blog Path (4HS002)
"The scariest moment is just before you start""I think timid writers like the passive voice for the same reason timid lovers like passiveFollow-up to: How can I strategically write a complex bestseller?
partners. The passive voice is safe." - Stephen King
2:27 PM, Mexico City, 08 July 2013
The Blog Path and the Time Dimension
Robin Hanson recently said that writing a book feels lonelier than writing blog posts. Blog posts have many features that books will never have. Not only the obvious ones such as instantaneous gratification, being able to complete a chunk of work in one sitting, and being able to show you are actually doing something, not just claiming you are. Blogs also partition time in a way that makes a primate brain comfortable, both from the reader's and the writer's perspective. But in my case the most important feature of blogs is that generate and test and trial and error are easy to do. So after my first post here, and weeks solving many of the surrounding problems that could impede me from moving forward, I decided to go through the beaten track and blog my way into a bestseller.
The Challenges Theme
The theme of the blog is self challenges, and it envisions the public that enjoys Saturday Morning Breakfast Cereal, with a side of A J Jacobs. It begins by the #50: Stop Learning, Start Doing.
This is the first post, so let’s cut to the chase: In this blog we’ll be going through a series of 50 challenges. Whatever you want to do, let’s do this together. You like A. J. Jacobs and Tim Ferriss? That’s a good start. You want to deal with your big picture question too? On top of that you like Science and Philosophy? You’ve come to the right place, but don’t take a seat yet, this is not a place to rest your gaze and get your warm fuzzy feeling inside by making a comment. This is a place to do.
All you’ll need prior to reading this blog is linked below:
You want to be one of the few Self-Actualizers out there? This won’t be any easy, and though we’ll make the journey together, no one besides you can do it for you.
But before we start, there are Six things you need to know, and they’re gonna hurt like few things you (... and it continues from here)
Previous LW Post Comments (ordered by upvotes):
Omid said that if writing is like music, being a bestseller is mostly about luck. Partially (0.5) I concur, but it seems to me that randomness in music interest is mostly dominated by prestige considerations from separate domains.
Trevor Blake told his story and made clear that only writing is writing, talking about it, or even what I'm doing here, writing about it, isn't it. That seemed like an important downward spiral to keep track of. It explains why these LW posts will be less frequent than I thought before.
Gwern and Pjeby had a long discussion about book stats and likelihoods of making bestselling lists. It is clear that it is very hard. But it made me feel it is less hard than I thought before.
ChristianKI asked the words per day question. I'll respond by saying that I read+write more than six hours a day, which is Stephen King's suggested time in his "On Writing"
Michaelos and Qiaochu_Yuan suggested a mixed nagging strategy, getting someone close to me to nag me about writing while also beeminding it. This seems very important. Beeminder is set, and feel free to nag me in private messages if you are reading this far from the date it was written. I'll get a friend whom I see a lot, and a sexy lady, and an authority figure, to nag me every once in a while. So whether I'm feeling gregarious, romantically infatuated or seeking validation, there will always be a chance that writing is the emotionally correct thing to do.
Finally, and of course there will never be time to respond to every comment here, though there might in the blog itself: Viliam_Bur devised "on the fly" a strategy, which nicely coincides with what I'm doing, except in that outlining the book is something I'll do after a few blog posts, now that this new "blog post" element entered the book agenda. Viliam also mentioned humans love reading stories, and the blogs next post will be one of my stories.
Getting informed about what does and doesn't work
Last post I said this post would contain a few things, among them "(d) Gather that information" in Salamon's list of strategic things to do in a project pursuit. From the information I got uptill now, including comments, posts in LW and asking authors by email, things that influence selling odds in non-fiction in a good way are, in no particular order:
1) Being famous
2) Writing a lot
3) Luck
4) Being a professor in a prestigious university
5) Passion
6) A wide circle of influence
7) Having a 1000 true fans, who'll buy your stuff because it is yours
8) Knowing your Grammar, and when to ignore it
9) Ignoring 80% of the criticism you receive
(those would be the "If I can't have it, so can't you" kind of critics, or just naturally spiteful individuals)
10) Paying five times more attention to the remaining 20%
11) If your reader says your writing is confusing, it is, by definition, confusing
12) Dealing with topics in a way that interests many, but focusing on your idealized one reader
13) Understand that lacking the level of obsession and resources used to promote The Four Hour Workweek, the journey could be as long as writing three or four books before making it big, or five hundred blog posts. It helps that I'm riding the four hour brand.
15) Using your strengths however you can
In my case I intend to use my "sure, naked dancing in public citing horoscopes sounds ok to me" strength, and also however many stories of unbelievable days this lack of embarrassment has given me.
Next LW Post
Before the next LW post I intend to copy Svi's idea of using TDT for a personal hacking experience, and also do the same thing with other unusual ideas that pop up in LW frequently. Instead of taking advice from something in LW that is specifically about strategic thinking, which I'm already doing with Great Courses lectures+Salamon's post, I'll just try to see how to administer things like TDT, Everett, Timeless Physics, AIXI, Newcomb, Iterated Prisoner Dilemma and PrudentBot into effective writing - ¿or should I call it effective bestselling now that I know writing itself is but the tip of the iceberg?. I have no idea how to do that transposition, but when last here I exposed my goals, and now it doesn't seem that embarrassing to do it anymore.
Last, I ask a favor with a story:
There is a one domain I never felt like learning more about. Seeking for truth is a noble goal, but some truths are information hazards, and I always had the impression that music, for me, was a dark terrain. It feels like the more I know - from almost nothing - about music, about structure, math, chords, composition, harmony, style, it all boils down to "unweaving the rainbow" in Dawkins' parlance. It detracts from the experience. Going to a music show for me is a torture, for the last thing I want to associate music with is a bunch of humans making coordinated physical motions in complex devices that cause the air to oscillate. I want music to be what makes my eyes teary when a Myiazaki's character finally saves the forgotten forest from the mountain spirit. Music should be a memoir of my grandma bringing me as a child to bed while Vivaldi's Spring surrounded the bed. By the same token, there are many details of people's lives we are better off unaware of, and in the case of a blogger, or a writer, you frequently just don't want to know the details, how easy or hard it was for her to write, or how long does she usually take in the shower. Most people are not hardcore epistemic rationalists, and I'd prefer that those didn't find any link, mention or pointer from the blog comments to the LW posts about it. Perhaps not so much in this community, but mystery is, and will forever remain, an important component in excitement and interest.
I'll finish off as I did before, by mentioning what this is all about: I don't know which LW posts contain the most compact, memorable or effective techniques for winning at being strategic, but I'm hoping by the end of this process the territory is better mapped for those who'd like to follow suit. Or point and laugh.
Wikifying the blog list
Konkvistador's excellent List of Blogs by LWers led me to some of my favorite blogs, but is pretty well hidden and gradually becoming obsolete. In order to create an easily-update-able replacement, I have created the wiki page List of Blogs and added most of the blogs from Konkvistador's list. If you have a blog, or you read blogs, please help in the following ways:
-- Add your blog if it's not on there, and if it has updated in the past few months (no dead blogs this time, exceptions for very complete archives of excellent material like Common Sense Atheism in the last section)
-- Add any other blogs you like that are written by LWers or frequently engage with LW ideas
-- Remove your blog if you don't want it on there (I added some prominent critics of LW ideas who might not want to be linked to us)
-- Move your blog to a different category if you don't like the one it's in right now
-- Add a description of your blog, or change the one that already exists
-- Change the name you're listed by (I defaulted to people's LW handles)
-- Bold the name of your blog if it updates near-daily, has a large readership/commentership, and/or gets linked to on LW a lot
-- Improve formatting
Somebody more familiar with the Less Wrong twittersphere might want to do something similar to Grognor's Less Wrong on Twitter
[Link] More Right launched
Various people (including Konkvistador who has been talking about it the most) have launched their blog More Right
"A group blog, More Right is a place to discuss the many things that are touched by politics that we prefer wouldn’t be, as well as right wing ideas in general. It grew out of the correspondences among like minded people in late 2012, who first began their journey studying the findings of modern cognitive science on the failings of human reasoning and ended it reading serious 19th century gentlemen denouncing democracy. Surveying modernity, we found cracks in its façade. Findings and seemingly correct ideas, carefully bolted down and hidden, met with disapproving stares and inarticulate denunciation when unearthed. This only whetted our appetites. Proceeding from the surface to the foundations, we found them lacking. This is reflected in the spirit of the site."
Blogs by LWers
Related to: Wikifying the Blog List
LessWrong posters and readers are generally pretty cool people. Maybe they are interesting bloggers too. And I'm not just talking about rationalist material, that we'd ideally like to be cross posted on LessWrong, no gardening blogs are also fair game. I'm making this a discussion level post so more people can see the list. Please share links to blogs by former or current LWers. Surely the authors wouldn't mind, who wouldn't like more readers? Original list here.
Anyone who wants to suggest a new blog for the list please follow this link.
Blogs by LWers:
- RobinHanson --- Overcoming Bias (Katja Grace and Robert Wiblin post here as well)
- Katja Grace --- Meteuphoric (very cool old posts and summaries)
- muflax --- muflax' mindstream, daily
- TGGP --- Entitled To An Opinion
- Yvain --- Jackdaws love my big sphinx of quartz
- juliawise --- Giving Gladly, Radiant Things
- James_G --- Writings
- steven0461 --- Black Belt Bayesian
- James Miller --- Singluarity Notes
- Jsalvati --- Good Morning, Economics
- Will Newsome --- Computational Theology
- clarissethorn --- Clarrise Thorn
- Zack M. Davis --- An Algorithmic Lucidity
- Kaj_Sotala --- A view to the gallery of my mind
- SilasBarta --- Setting Things Straight
- tommcabe --- The Rationalist Conspiracy
- Alicorn --- Irregular Updates By An Irregular Person
- MBlume --- Baby, check this out; I've got something to say.
- ciphergoth --- Paul Crowley's blog (mostly about cryonics), Paul Crowley
- XiXiDu --- Alexander Kruel
- Aurini --- Stares At The World
- jkaufman --- Jeff Kaufman
- Bill_McGrath --- billmcgrathmusic
- Sister Y --- the view from hell
- PaulWright --- Paul Wright's blog
- _ozymandias --- http://ozyfrantz.com/
- mstevens --- stdout
- HughRistik --- Feminist Critics
- Julia_Galef --- Measure of Doubt
- NancyLebovitz --- Input Junkie
- David Gerard --- a bunch of them
- Jayson_Virissimo --- Jay, Quantified
- kpreid --- Kevin Reid's blog
- hegemonicon --- Coarse Grained
- Villiam_Bur --- bur.sk
- Emile --- The Rational Parent
- lukeprog --- Common Sense Atheism
- Grognor --- Grognor's Blog
- CarlShulman --- Reflective Disequilibrium
- OrphanWilde --- Aretae
- Alexei --- Bent Spoon Games Blog
- TimS --- Georgia Special Education Law Blog
- loup-valliant --- @ Loup's
- RolfAndreaseen --- Yngling Saga
- arundelo --- Aaron Brown
- peter_hurtford --- Greatplay.net
- brilee --- Modern Descartes
- gwern --- gwern.net
- erratio --- The merry-go-round of life
- jimmy --- The Art and Science of Cognitive Engineering
- alexvermeer --- alexvermeer.com
- sark --- sarkology
- gjm --- Scribble, scribble, scribble
- Giles --- Prince Mm Mm
- Chris Hallquist --- The Uncredible Hallq
- EricHerboso --- EricHerboso.org
- Eneasz - Death Is Bad
- Tuxedage - Essays and other Musings
- Federico - studiolo
- Trevor Blake - OVO, editor-Dora Marsden, lead judge-George Walford International Essay Prize
- Pablo_Stafforini -- Pablo's miscellany
Note: Anyone just digging for interesting blogs they would like to read but dosen't care if they are written by LWers or not should check out this thread or maybe this one. Did you guys know we have a wiki article with external resources? We do. Check that out as well. Maybe once we figure out which LWer blogs related to rationality on this list are particularly good we can add a few of them there too.
[Link]: 80,000 hours blog
Some of you probably aren't aware yet of the rather excellent High Impact Careers / 80,000 hours blog.
It covers topics about how to have the biggest impact with your career, including
- how likely you are to become Prime Minister
- Decision Making under Moral Uncertainty
- Temporal Concerns
- Health vs Education
- Existential Risks
- Startups in the US vs UK
- ... and many more
The contributors include Carl Shuman, Will Crouch, Ben Todd and Katja Grace, with an impressively regular updating schedule at the moment.
The reasoning is obvious in retrospect, but is useful to have written down, especially with the research that's gone into the posts. - much like the Sequences in that regard.
Link: Facing the Mind-Killer
I've long opposed discussing politics on Less Wrong. Elsewhere, however, I have been known to gaze into the abyss; and so it came to be that I wrote a handful of blog posts of the Oxford Libertarian Society Blog. I had the deliberate intention of bring a little bit of rationality into politics - and so of course ended up writing in something like Eliezer's style.
I wanted to establish some theory first, so the initial posts were about The Conservation of Expected Evidence and Reductionism, and then one particular Death-Spiral.
As you'll probably notice, one of my defences against the little-death has been to err on the side of attacking Libertarian positions; I provided an account of Traditional Socialist Values so we remember that our enemies aren't inherently evil, and then analysed an abuse of The Law of Comparative Advantage, showing cases where it didn't apply.
I can't promise I'll update at all regularly.
Post inspired by Will Newsome and prompted by Vladimir Nesov.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)