These are quick notes on an idea for an indirect strategy to increase the likelihood of society acquiring robustly safe and beneficial AI.

 

Motivation:

  • Most challenges we can approach with trial-and-error, so many of our habits and social structures are set up to encourage this. There are some challenges where we may not get this opportunity, and it could be very helpful to know what methods help you to tackle a complex challenge that you need to get right first time.

  • Giving an artificial intelligence good values may be a particularly important challenge, and one where we need to be correct first time. (Distinct from creating systems that act intelligently at all, which can be done by trial and error.)

  • Building stronger societal knowledge about how to approach such problems may make us more robustly prepared for such challenges. Having more programmers in the AI field familiar with the techniques is likely to be particularly important.

 

Idea: Develop methods for training people to write code without bugs.

  • Trying to teach the skill of getting things right first time.

  • Writing or editing code that has to be bug-free without any testing is a fairly easy challenge to set up, and has several of the right kind of properties. There are some parallels between value specification and programming.

  • Set-up puts people in scenarios where they only get one chance -- no opportunity to test part/all of the code, just analyse closely before submitting.

    • Interested in personal habits as well as social norms or procedures that help this.

      • Daniel Dewey points to standards for code on the space shuttle as a good example of getting high reliability code edits.

 

How to implement:

  • Ideal: Offer this training to staff at software companies, for profit.

    • Although it’s teaching a skill under artificial hardship, it seems plausible that it could teach enough good habits and lines of thinking to noticeably increase productivity, so people would be willing to pay for this.

    • Because such training could create social value in the short run, this might give a good opportunity to launch as a business that is simultaneously doing valuable direct work.

    • Similarly, there might be a market for a consultancy that helped organisations to get general tasks right the first time, if we knew how to teach that skill.

  • More funding-intensive, less labour intensive: run competitions with cash prizes

    • Try to establish it as something like a competitive sport for teams.

    • Outsource the work of determining good methods to the contestants.

 

This is all quite preliminary and I’d love to get more thoughts on it. I offer up this idea because I think it would be valuable but not my comparative advantage. If anyone is interested in a project in this direction, I’m very happy to talk about it.

New Comment
36 comments, sorted by Click to highlight new comments since:
[-]Shmi70

Having been in the industry for longer than most people around here, I am fairly confident that what you are suggesting is, for all practical purposes, impossible. I mean, it's possible to train and set up prizes, it's impossible to write bug-free code of any complexity.

If you look through the history of such attempts (e.g. Dijkstra's formal verification, various procedural and functional languages specifically developed for this purpose), even the successful ones failed at the stated goals of producing bug-free code.

As Gram_Stone mentioned earlier, the reason is nearly always the same: humans are incapable of producing bug-free non-trivial formal requirements. And that you cannot teach without being a fraud, because human mind itself is too weak and buggy to analyze a complex enough problem.

For more than a decade I have been systematically identifying error-prone programming habits—by reviewing the literature, by analyzing other people’s mistakes, and by analyzing my own mistakes—and redesigning my programming environment to eliminate those habits. For example, “escape” mechanisms, such as backslashes in various network protocols and % in printf, are error-prone: it’s too easy to feed “normal” strings to those functions and forget about the escape mechanism.

I switched long ago to explicit tagging of “normal” strings; the resulting APIs are wordy but no longer error-prone. The combined result of many such changes has been a drastic reduction in my own error rate. Starting in 1997, I offered $500 to the first person to publish a verifiable security hole in the latest version of qmail, my Internet-mail transfer agent; see http://cr.yp.to/qmail/guarantee.html. There are now more than a million Internet SMTP servers running qmail. Nobody has attempted to claim the $500. Starting in 2000, I made a similar $500 offer for djbdns, my DNS software; see http://cr.yp.to/djbdns/guarantee.html. This program is now used to publish the IP addresses for two million .com domains: citysearch.com, for example, and lycos.com. Nobody has attempted to claim the $500.

There were several non-security bugs in qmail, and a few in djbdns. My error rate has continued to drop since then. I’m no longer surprised to whip up a several-thousand-line program and have it pass all its tests the first time.

Bug-elimination research, like other user-interface research, is highly nonmathematical.

The goal is to have users, in this case programmers, make as few mistakes as possible in achieving their desired effects. We don’t have any way to model this—to model human psychology—except by experiment. We can’t even recognize mistakes without a human’s help. (If you can write a program to recognize a class of mistakes, great—we’ll incorporate your program into the user interface, eliminating those mistakes—but we still won’t be able to recognize the remaining mistakes.) I’ve seen many mathematicians bothered by this lack of formalization; they ask nonsensical questions like “How can you prove that you don’t have any bugs?” So I sneak out of the department, take off my mathematician’s hat, and continue making progress towards the goal.

http://cr.yp.to/cv/activities-20050107.pdf (apparently this guy's accomplishments are legendary in crypto circles)

http://www.fastcompany.com/28121/they-write-right-stuff

Personal experience: I found that I was able to reduce my bug rate pretty dramatically through application of moderate effort (~6 months of paying attention to what I was doing and trying to improve my workflow without doing anything advanced like screencasting myself or even taking dedicated self-improvement time), and I think it could probably be increased even more by adding many layers of process.

In any case, I think it makes sense to favor the development of bug reduction techs like version control, testing systems, type systems, etc. as part of a broad program of differential technological development. (I wonder how far you could go by analyzing almost every AGI failure mode as a bug of some sort, in the "do what I mean, not what I say" sense. The key issues being that bugs don't always manifest instantly and sometimes change behavior subtly instead of immediately halting program execution. Maybe the "superintelligence would have tricky bugs" framing would be an easier sell for AI risks to computer scientists. The view would imply that we need to learn to write bug free code, including anticipating & preventing all AGI-specific bugs like wireheading, before building an AGI.)

See also: My proposal for how to structure FAI development.

Thanks, this is a great collection of relevant information.

I agree with your framing of this as differential tech development. Do you have any thoughts on the best routes to push on this?

I will want to think more about framing AGI failures as (subtle) bugs. My initial impression is positive, but I have some worry that it would introduce a new set of misconceptions.

Sorry for the slow reply.

I'm flattered that my thoughts as someone who has no computer science degree and just a couple years of professional programming experience are considered valuable. So here's more info-dumping (to be taken with a grain of salt, like the previous info dump, because I don't know what I don't know):

  • My comment on different sorts of programming, and programming cultures, and how tolerant they are of human error. Quick overview of widely used bug reduction techs (code review and type systems should have been included).

  • Ben Kuhn says that writing machine learning code is unusually unforgiving, which accords well with my view that data science programming is unusually unforgiving (although the reasons aren’t completely the same).

  • Improving the way I managed my working memory seemed important to the way I reduced bugs in my code. I think by default things fall out of your working memory without you noticing, but if you allocate some of your working memory to watching your working memory, you can prevent this and solve problems in a slower but less error-prone way. The subjective sense was something like having a "meditative bulldozer" thinking style where I was absolutely certain of what I had done with each subtask before going on to the next. It's almost exactly equivalent to doing a complicated sequence of algebraic operations correctly on the first try. It seems slower at first, but it's generally faster in the long run, because fixing errors after the fact is quite slow. This sort of perfectionistic attention to detail was actually counterproductive for activities I worked on after quitting my job, like reading marketing books, and I worked to train it out. The feeling was one of switching in to a higher mental gear: I could no longer climb steep hills, but I was much faster on level ground. (For reference: my code wasn't completely bug-free by any means, but I did have a reputation on the ops team of deploying unusually reliable data science code, I once joked to my boss on our company bug day that the system I maintained was bug-free and I didn't have anything to do, which caused her to chuckle in assent, and my superiors were sad to see me leave the company.)

  • For our company hack day, I wrote a Sublime Text extension that would observe the files opened and searches performed by the user in order to generate a tree diagram attempting to map the user’s thought process. This seemed helpful for expanding my working memory with regard to a particular set of coding tasks (making a scattered set of coherent modifications in a large, thorny codebase).

  • Another thing that seemed useful was noticing how unexpected many of the bugs I put in to production were and trying to think about how I might have anticipated the bug in advance. I noticed that the most respected & senior programmers at the company I worked all had a level of paranoia that seemed excessive to me as a newbie, but I gradually learned to appreciate. (I worked at a company where we maintained a web app that was deployed every day, and we had a minimal QA team and no staging of releases, so us engineers had lots of opportunities to screw up.) Over time, I developed a sort of “peripheral vision” or lateral thinking capability that let me forsee some of these unexpected bugs the way the more senior engineers did.

Being stressed out seemed to impede my performance in writing bug-free code significantly (esp. the “peripheral vision” aspect, subjectively). The ideal seems to be a relaxed deep flow state. One of my coworkers was getting chewed out because a system he wrote continued causing nasty consumer-facing bugs through several rewrites. I was tasked with adding a feature to this system and determined that a substantial rewrite would be necessary to accommodate the feature. I was pleased with myself when my rewrite was bug-free practically on the first try, but I couldn’t help but think I had an unfair advantage because I wasn’t dealing with the stress of getting chewed out. I’m in favor of blameless post-mortems (larger antipattern: System 1 view of bugs as aversive stimuli to be avoided rather than valued stimuli to be approached; procrastinating on difficult tasks in favor of easy ones can be a really expensive and error-prone way to program. You want to use the context that’s already loaded in your head efficiently and solve central uncertainties before peripheral ones.)

(Of course I’m conveniently leaving out lots of embarrassing failures… for example, I once broke almost all of the clickable buttons on our website for several hours before someone noticed. I don’t think I ever heard of anyone else screwing up this bad. In my post-mortem I determined that my feeling that I had already done more than could reasonably expected on this particular task caused me to stop working when I ran out of steam instead of putting in the necessary effort to make sure my solution was totally correct. It took me a while to get good at writing low-bug code and arguably I’m still pretty bad at it.)

Do you have any thoughts on the best routes to push on this?

Hm, I guess one idea might be to try to obtain the ear of leading thinkers in the programming world--Joel Spolsky and Jeff Atwood types--to blog more about the virtues of bug prevention and approaches to it. My impression is that leading bloggers have more influence on software development culture than top CTOs or professors. But it wouldn't necessarily need to be leading thinkers spearheading this; anyone can submit a blog post to Hacker News. I think what you'd want to do is compile a big list of devastating bugs and then sell a story that as software eats the world more and more, we need to get better at making it reliable.

I'm not hopeful that there's an easy solution (or I think it would be used in the industry), and I don't think you'd get up to total reliability.

Nonetheless it seems likely that there are things people can do that increase their bug rate, and there are probably things they can do that would decrease it. These might be costly things -- perhaps it involves writing detailed architectural plans for the software and getting these critiqued and double-checked by a team who also double-check that the separate parts do the right thing with respect to the architecture.

Maybe you can only cut your bug rate by 50% at the cost of going only 5% of normal speed. In that case there may be no commercially useful skills here. But it still seems like it would be useful to work out what kind of things help to do that.

I don't know if this is commercially feasible, but I do like this idea from the perspective of building civilizational competence at getting things right on the first try.

Maybe I'm missing the point, but it seems that good old-fashioned formal verification would largely mitigate the issue of bugs in the classical sense, which seems to be what MIRI already intends to use. The greatest problem lies outside of the issue of verifying that the program functions in the obvious sense; it lies in ensuring that the specification is referring to that which we want it to refer: the full complement of human values. This is a philosophical and scientific problem that is likely beyond the purview of formal verification or best programming practices.

[-]gjm60

I think you may indeed be missing the point, which is not

  • that formal bug-minimizing software development techniques will suffice to produce safe AI

but

  • that producing safe AI requires (what is currently) extraordinary success at "getting things right first time", so that
    • it would be beneficial to get better at doing that, and to foster a culture of trying to do it,
    • and practice at making bug-free software might be an effective way to do so
      • not least because value-specification and programming are sufficiently parallel that some of the same techniques or patterns of thought might be useful in both.

However, what I take to be your main point -- that value-specification and software development are in fact not terribly similar, so that practice at one may not help much with the other -- is as applicable either way.

Yes, gjm's summary is right.

I agree that there are some important disanalogies between the two problems. I thought software development was an unusually good domain to start trying to learn the general skill, mostly because it offers easy-to-generate complex challenges where it's simple to assess success.

that producing safe AI requires (what is currently) extraordinary success at "getting things right first time", so that

See my reply above - this line of thought is fundamentally mistaken. Simulation testing is far more of an effective general solution than formal verification.

[-]gjm10

Just to clarify: I was stating the thesis of the OP, not asserting it. Neither am I now denying it. (I don't find myself altogether convinced either by the OP or by your arguments for why the OP is "probably completely mistaken".)

[-][anonymous]20

For more than a decade I have been systematically identifying error-prone programming habits—by reviewing the literature, by analyzing other people’s mistakes, and by analyzing my own mistakes—and redesigning my programming environment to eliminate those habits. For example, “escape” mechanisms, such as backslashes in various network protocols and % in printf, are error-prone: it’s too easy to feed “normal” strings to those functions and forget about the escape mechanism.

I switched long ago to explicit tagging of “normal” strings; the resulting APIs are wordy but no longer error-prone. The combined result of many such changes has been a drastic reduction in my own error rate. Starting in 1997, I offered $500 to the first person to publish a verifiable security hole in the latest version of qmail, my Internet-mail transfer agent; see http://cr.yp.to/qmail/guarantee.html. There are now more than a million Internet SMTP servers running qmail. Nobody has attempted to claim the $500. Starting in 2000, I made a similar $500 offer for djbdns, my DNS software; see http://cr.yp.to/djbdns/guarantee.html. This program is now used to publish the IP addresses for two million .com domains: citysearch.com, for example, and lycos.com. Nobody has attempted to claim the $500.

There were several non-security bugs in qmail, and a few in djbdns. My error rate has continued to drop since then. I’m no longer surprised to whip up a several-thousand-line program and have it pass all its tests the first time.

Bug-elimination research, like other user-interface research, is highly nonmathematical.

The goal is to have users, in this case programmers, make as few mistakes as possible in achieving their desired effects. We don’t have any way to model this—to model human psychology—except by experiment. We can’t even recognize mistakes without a human’s help. (If you can write a program to recognize a class of mistakes, great—we’ll incorporate your program into the user interface, eliminating those mistakes—but we still won’t be able to recognize the remaining mistakes.) I’ve seen many mathematicians bothered by this lack of formalization; they ask nonsensical questions like “How can you prove that you don’t have any bugs?” So I sneak out of the department, take off my mathematician’s hat, and continue making progress towards the goal.

http://cr.yp.to/cv/activities-20050107.pdf (apparently this guy's accomplishments are legendary in crypto circles)

[This comment is no longer endorsed by its author]Reply

Giving an artificial intelligence good values may be a particularly important challenge, and one where we need to be correct first time.

This view is probably completely mistaken, for two separate reasons:

  1. we can test AI architectures at different levels of scaling. A human brain is just a scaled up primate brain, which suggests that all the important features of how value acquisition works, empathy, altruism, value alignment, whatever - all of those features can be tested first in AGI that is near human level.

  2. We already have encountered numerous large-scale 'one-shot' engineering challenges, and there is already an extremely effective general solution. If you have a problem that you have to get right the first try, you change this into an iterative problem by creating a simulation framework. Doing that for AGI may involve creating the Matrix, more or less, but that isn't necessarily anymore complex than creating AGI in the first place.

To me these look like (pretty good) strategies for getting something right the first time, not in opposition to the idea that this would be needed.

They do suggest that an environment which is richer than just "submit perfect code without testing" might be a better training ground.

To clarify, I was not critiquing the idea that we need to get "superintelligence unleashed on the world" correct the first try - that of course I do agree with. I was critiquing the more specific idea that we need to get AGI morality/safety correct the first try.

One could compare to ICBM missile defense systems. The US (and other nations) have developed that tech, and i'ts a case where you have to get the deployed product "right the first try". However you can't test it in the real world, but you absolutely can do iterative development in simulation, and this really is the only sensible way to develop such tech. Formal verification is about as useful for AGI safety as it is for testing ICBM defense - not much use at all.

I'm not sure how much we are disagreeing here. I'm not proposing anything like formal verification. I think development in simulation is likely to be an important tool in getting it right the first time you go "live", but I also think there may be other useful general techniques/tools, and that it could be worth investigating them well in advance of need.

Agreed. In particular I think IRL (Inverse Reinforcement Learning) is likely to turn out to be very important. Also, it is likely that the brain has some clever mechanisms for things like value acquisition or IRL, as well as empathy/altruism, and figuring out those mechanisms could be useful.

This is a really cool idea. The things you need to do to get things right the first time in a novel situation varies heavily according to what the problem is. Programming would be useful for training yourself on many tasks since many situations need you to consider a decision and then decide on a series of actions that follow logically from the situation.

However, there are many other skills. To successfully complete some tasks, you may need an extremely good memory so that you can remember to follow all the steps. Others may require you to be free from biases, understand people and understand risk. For many situations, the best way to make decisions is actually intuitive and here the programming practise will be much less useful. But, it'd still be helpful for many situations.

From http://www.infoq.com/news/2015/05/provably-correct-software :

Companies like ASML and Philips Healthcare report 40% or greater reduction in costs and time-to-market using formal methods to deliver highly reliable software control products. [...]

NASA uses the Spin model checker to develop and verify Mars rover control software – google "mars code". Toyota, after its debacle with Camry unintended acceleration, adopted the Altran SPARK formal-methods tools to develop reliable software (toyota ada spark).

I think there may be an unfounded assumption here - that an unfriendly AI would be the results of some sort of bug, or coding errors that could be identified ahead of time and fixed.

I rather suspect those sorts of stuff would not result in "unfriendly", they would result in crash/nonsense/non-functional AI.

Presumably part of the reason the whole friendly/non-friendly thing is an issue is because our models of cognition are crude, and a ton of complex high-order behavior is a result of emergent properties in a system, not from explicit coding. I would expect the sort of error that accidentally turns an AI into a killer robot would be subtle enough that it is only comprehensible in hindsight, if then. (Note this does not mean intentionally making a hostile AI is all that hard. I can make hostility, or practical outcomes identical to it, without AI at all, so it stands to reason that could carry over)

I'm not suggesting that the problems would come from what we normally think of as software bugs (though see the suggestion in this comment). I'm suggesting that they would come from a failure to specify the right things in a complex scenario -- and that this problem bears enough similarities to software bugs that they could be a good test bed for working out how to approach such problems.

The flaws leading to an unexpectedly unfriendly AI certainly might lead back to a flaw in the design - but I think it is overly optimistic to think that the human mind (or a group of minds, or perhaps any mind) is capable of reliably creating specs that are sufficient to avoid this. We can and do spend tremendous time on this sort of thing already, and bad things still happen. You hold the shuttle up as an example of reliability done right (which it is) - but it still blew up, because not all of shuttle design is software. In the same way, the issue could arise from some environmental issue that alters the AI in such a way that it is unpredictable - power fluctuations, bit flip, who knows. The world is a horribly non-deterministic place, from a human POV.

By way of analogy - consider weather prediction. We have worked on it for all of history, we have satellites and supercomputers - and we are still only capable of accurate predictions for a few days or week, getting less and less accurate as we go. This isn't a case of making a mistake - it is a case of a very complex end-state arising from simple beginnings, and lacking the ability to make perfectly accurate predictions about some things. To put it another way - it may simply be the problem is not computable, now or with any forseeable technology.

I'm not sure quite what point you're trying to make:

  • If you're arguing that with the best attempt in the world it might be we still get it wrong, I agree.
  • If you're arguing that greater diligence and better techniques won't increase our chances, I disagree.
  • If you're arguing something else, I've missed the point.

Fair question.

My point is that if improving techniques could take you from (arbitrarily chosen percentages here) a 50% chance that an unfriendly AI would cause an existential crisis, to 25% chance that it would - you really didn't gain all that much, and the wiser course of action is still not to make the AI.

The actual percentages are wildly debatable, of course, but I would say that if you think there is any chance - no matter how small - of triggering ye olde existential crisis, you don't do it - and I do not believe that technique alone could get us anywhere close to that.

The ideas you propose in OP seem wise, and good for society - and wholly ineffective in actually stopping us from creating an unfriendly AI, The reasons are simply that the complexity defies analysis, at least by human beings. The fear is that the unfriendly arises from unintended design consequences, from unanticipated system effects rather than bugs in code or faulty intent

It's a consequence of entropy - there are simply far, far more ways for something to get screwed up than for it to be right. So unexpected effects arising from complexity are far, far more likely to cause issues than be beneficial unless you can somehow correct for them - planning ahead only will get you so far.

Your OP suggests that we might be more successful if we got more of it right "the first time". But - things this complex are not created, finished, de-novo - they are an iterative, evolutionary task. The training could well be helpful, but I suspect not for the reasons you suggested. The real trick is to design things so that when they go wrong - it still works correctly. You have to plan for and expect failure, or that inevitable failure is the end of the line.

My point is that if improving techniques could take you from (arbitrarily chosen percentages here) a 50% chance that an unfriendly AI would cause an existential crisis, to 25% chance that it would - you really didn't gain all that much, and the wiser course of action is still not to make the AI.

That's not the choice we are making. The choice we are making is to decide to develop those techniques.

The techniques are useful, in and of themselves, without having to think about utility in creating a friendly AI.

So, yes, by all means, work on better skills.

But - the point I'm trying to make is that while they may help, they are insufficient to provide any real degree of confidence in preventing the creation of an unfriendly AI, because the emergent effects that would likely be responsible for such are not amenable to planning about ahead of time.

It seems to me your original proposal is the logical equivalent to "Hey, if we can figure out how to better predict where lightning strikes - we could go there ahead of time and be ready to stop the fires quickly, before the spread". Well, sure - except that sort of prediction would depend on knowing ahead of time the outcome of very unpredictable events ("where, exactly, will the lightning strike?") - and it would be far more practical to spend the time and effort on things like lightning rods and firebreaks.

But - the point I'm trying to make is that while they may help, they are insufficient to provide any real degree of confidence in preventing the creation of an unfriendly AI

Basically you attack a strawman.

and it would be far more practical to spend the time and effort on things like lightning rods and firebreaks.

Unfortunately I don't think anybody has proposed an idea of how to solve FAI that's as straightforward as building lighting rods.

In computer security there the idea of "defense in depth". You try to get every layer right and as secure as possible.

Strawman?

"... idea for an indirect strategy to increase the likelihood of society acquiring robustly safe and beneficial AI." is what you said. I said preventing the creation of an unfriendly AI.

Ok. valid point. Not the same.

I would say the items described will do nothing whatsoever to "increase the likelihood of society acquiring robustly safe and beneficial AI."

They are certainly of value in normal software development, but it seems increasingly likely as time passes without a proper general AI actually being created that such a task is far, far more difficult than anyone expected, and that if one does come into being, it will happen in a manner other than the typical software development process as we do things today. It will be an incremental process of change and refinement seeking a goal, is my guess. Starting from a great starting point might presumably reduce the iterations a bit, but other than a head start toward the finish line, I cannot imagine it would affect the course much.

If we drop single cell organisms on a terraformed planet, and come back a hundred million years or so - we might well expect to find higher life forms evolved from it, but finding human beings is basically not gonna happen. If we repeat that - same general outcome (higher life forms), but wildly differing specifics. The initial state of the system ends up being largely unimportant - what matters is evolution, the ability to reproduce, mutate and adapt. Direction during that process could well guide it - but the exact configuration of the initial state (the exact type of organisms we used as a seed) is largely irrelevant.

re. Computer security - I actually do that for a living. Small security rant - my apologies:

You do not actually try to get every layer "as right and secure as possible." The whole point of defense in depth is that any given security measure can fail, so to ensure protection, you use multiple layers of different technologies so that when (not if) one layer fails, the other layers are there to "take up the slack", so to speak.

The goal on each layer is not "as secure as possible", but simply "as secure as reasonable" (you seek a "sweet spot" that balances security and other factors like cost), and you rely on the whole to achieve the goal. Considerations include cost to implement and maintain, the value of what you are protecting, the damage caused should security fail, who your likely attackers will be and their technical capabilities, performance impact, customer impact, and many other factors.

Additionally, security costs at a given layer do not increase linearly, so making a given layer more secure, while often possible, quickly becomes inefficient. Example - Most websites use a 2k SSL key; 4k is more secure, and 8k is even moreso. Except - 8k doesn't work everywhere, and the bigger keys come with a performance impact that matters at scale - and the key size is usually not the reason a key is compromised. So - the entire world (for the most part) does not use the most secure option, simply because it's not worth it - the additional security is swamped by the drawbacks. (Similar issues occur regarding cipher choice, fwiw).

In reality - in nearly all situations, human beings are the weak link. You can have awesome security, and all it takes is one bozo and it all comes down. SSL is great, until someone manages to get a key signed fraudulently, and bypasses it entirely. Packet filtering is dandy, except that fred in accounting wanted to play minecraft and opened up a ssh tunnel, incorrectly. MFA is fine, except the secretary who logged into the VPN using MFA just plugged the thumb drive they found in the parking lot into per PC, and actually ran "Elf Bowling", and now your AD is owned and the attacker is escalating privledge from inside. so it doesn't matter that much about your hard candy shell, he's in the soft, chewy center. THIS, by the way, is where things like education are of the most value - not in making the very skilled more skilled, but in making the clueless somewhat more clueful. If you want to make a friendly AI - remove human beings from the loop as much as possible...

Ok, done with rant. Again, sorry - I live this 40-60 hours a week.

I disagree that "you really didn't gain all that much" in your example. There are possible numbers such that it's better to avoid producing AI, but (a) that may not be a lever which is available to us, and (b) AI done right would probably represent an existential eucatastrophe, greatly improving our ability to avoid or deal with future threats.

I have an intellectual issue with using "probably" before an event that has never happened before, in the history of the universe (so far as I can tell).

And - if I am given the choice between slow, steady improvement in the lot of humanity (which seems to be the status quo), and a dice throw that results in either paradise, or extinction - I'll stick with slow steady, thanks, unless the odds were overwhelmingly positive. And - I suspect they are, but in the opposite direction, because there are far more ways to screw up than to succeed, and once the AI is out - you no longer have a chance to change it much. I'd prefer to wait it out, slowly refining things, until paradise is assured.

Hmm. That actually brings a thought to mind. If an unfriendly AI was far more likely than a friendly one (as I have just been suggesting) - why aren't we made of computronium? I can think of a few reasons, with no real way to decide. The scary one is "maybe we are, and this evolution thing is the unfriendly part..."

Meta: I'd love to know whether the downvotes are because people don't like the presentation of undeveloped ideas like this, or because they don't think the actual idea is a good one.

(The first would put me off posting similar things in the future, the second would encourage me as a feedback mechanism.)

[-]Elo00

I would suggest that software is not a good domain to start this project in because it is by nature able to be replicated in various forms. (Ctrl-c, CTRL-V for one).

If you consider other areas; for example - refining a piece of wood. In fact in any other domain where working with a rare natural formation that has value due to its form; (Rare minerals or crystals; Ceramics where once fired you must start again if it doesn't work, pure substances where once contaminated will need to be purified again, )

even simpler models of a perfection process are possible - peeling an egg perfectly, shelling a nut.

Its a matter of putting extra value on a perfect finish product. Where a premium diamond is worth a lot more than a 99% pure one.

Side note: the advantage of software is that it is easily scaleable to teach many people at once.

Software may not be the best domain, but it has a key advantage over the other suggestions you are making: it's easy to produce novel challenges that are quite different from the previous challenges.

In a domain such as peeling an egg, it's true that peeling an individual egg has to be done correctly first time, but one egg is much like another, so the skill transfers easily. On the other hand one complex programming challenge may be quite different from another, so the knowledge from having solved one doesn't transfer so much. This should I think help make sure that the skill that does transfer is something closer to a general skill of knowing how to be careful enough to get it right first time.

[-]Elo00

There are lots of factors involved in peeling a perfect egg, most seem to matter before you hand it to a person to peel it.

The most applicable areas where "get it right the first time" seems to apply are areas with a high cost of failure (this priceless gem will never be the same again, If I care enough about the presentation of this dish I will have to boil another egg).

This also relates well to deliberate practice where a technique to make practicing a skill harder is to be less tolerant of errors. A novel challenge is good, but in most real-world novel situations with high-cost-failures the situation is not easy to replicate.

Another area that comes to mind with high cost of failure and "get it right" model would be hostage negotiations.

Most challenges we can approach with trial-and-error, so many of our habits and social structures are set up to encourage this.

This hasn't always been the case. Throughout history leader have had to get it right the first time. Especially in war-times. I'd bet that someone more versed in history than I could give lots of examples. Granted in our current society which is saturated with complexity and the limits of human comprehension trial-and-error seems like a viable way. But hierachical processes seem to work quite well for major human endeavors like th manhatten project.

Good point that this hasn't always been the case. However, we also know that people made a lot of mistakes in some of these cases. It would be great to work out how we can best approach such challenges in the future.