Meetup : Tel Aviv Meetup: Rump Session

1 JoshuaFox 21 November 2014 11:12AM

Discussion article for the meetup : Tel Aviv Meetup: Rump Session

WHEN: 25 November 2014 07:00:00PM (+0200)

WHERE: Yigal Alon Street 98, Tel Aviv-Yafo, Israel

WHEN: 25 November 2014 07:00:00PM (+0200) WHERE: Google, Electra Tower 29th Floor, Yigal Alon Street 98, Tel Aviv.

It's going to be a Rump Session. Each speaker gets 4 minutes to talk, with a 3 minute encore if we applaud hard enough. So, come with an idea or two. It's optional, but everyone is encouraged to talk about whatever they want: science, poetry, history, decision theory, rationality, life, the universe, and everything else.

We'll start the meetup at 19:00, and we'll go on as much as we like to. Feel free to come a little bit later.

We'll meet at the 29th floor of the building. (Note: Not the 34th where Google Campus is.) If you arrive and can't find your way around, call Anatoly who is very graciously hosting us at 054-245-1060 or Joshua at 054-569-1165

Why come: - You'll get to hang out with awesome people and have fun! - You'll hear ideas that you have never heard, in snippets short enough that you never get bored. - You'll get an appreciative audience, in snippets short enough that you never run out of things to say.

If you have any question feel free to send me a Private Message.

Discussion article for the meetup : Tel Aviv Meetup: Rump Session

Meetup : Prof. Roman Yampolskiy on approaches to AGI risk, Tel Aviv

2 JoshuaFox 25 May 2014 11:27AM

Discussion article for the meetup : Prof. Roman Yampolskiy on approaches to AGI risk, Tel Aviv

WHEN: 28 May 2014 06:34:45AM (+0300)

WHERE: 98 Yigal Alon St.,, 33rd floor, Tel Aviv

On Wednesday May 28, Roman Yampolskiy will speak on approaches to AGI risk. See his overview article "Responses to Catastrophic AGI Risk," coauthored with Kaj Sotala.

It will be at Google Israel's offices, Electra Tower, 98 Yigal Alon St., Tel Aviv.

We'll meet at the 29th floor at 20:00, then go right up to the 33rd floor and start the talk at 20:30.

If you can't find us, call Anatoly who is graciously hosting us at 054-245-1060, or me at 054-569-1165.

Discussion article for the meetup : Prof. Roman Yampolskiy on approaches to AGI risk, Tel Aviv

Finding LessWrongers on LinkedIn

9 JoshuaFox 13 May 2014 11:46AM

We've had a few initiatives recently to connect LessWrongers for business networking.

Here is another one: LW group at LinkedInThink of it not as a discussion group or online community, but as a "tag" on your LinkedIn profile, to help other LWers find you.

(I've turned off the discussion functionality on the LI group, since the Google Group or LessWrong.com are better for that purpose.)

Join here: http://www.linkedin.com/groups/LessWrong-8108647 

Another idea: Invite other LessWrongers to connect on LinkedIn, including not only those on the LW LI group, but any you know from the online community. It's a good way to get them in your Rolodex.* (You may have to dig up their email first. Then, add them at the LI menu Connections->Add Connections->Any Email->Add by individual Email.) 

For reference, here are recent posts on connecting between LWers:

  1. A Google Group for networking
  2. Updating your LW profile page  
  3. A survey, and reports about LessWrong as social catalyst.


I'm interested in seeing which of these initiatives actually helps people, so please let us know.

Is  the Rolodex dead enough that we can use that metaphor again? 


LessWrong as social catalyst

14 JoshuaFox 28 April 2014 02:10PM

I'd like to ask everyone: Have LessWrong.com and related online rationalist/transhumanist/Singularitarian communities connected you to people for purposes beyond discussion?

Bonus points if an online contact led to an important connection. We already know that face-to-face meetups are a great way to meet people, so I'm curious about connections triggered by online interaction.  

I have seen scattered mentions that each of the above has happened, but not enough to get a strong impression of what's going on.

Please answer in comments below. By doing so, you'll be providing social proof that LW and the like can accomplish these things, and so encourage more to happen, increasing happiness in the world.

I added one myself to start.

Business Networking through LessWrong

31 JoshuaFox 02 April 2014 05:39PM

Is anyone interested in contacting other people in the LessWrong community to find a job, employee, business partner, co-founder, adviser, or investor?

Connections like this develop inside ethnic and religious groups, as well as  among university  alums or members of a fraternity. I think that LessWrong can provide  the same value.

For example, LessWrong must have plenty of skilled software developers in dull jobs, who would love to work with smart, agenty rationalists. Likewise, there must be some company founders or managers who are having a very hard time finding good software developers. 

A shared commitment to instrumental and epistemic rationality should be a good starting point, not to mentioned a shared memeplex to help break the ice. (Paperclips! MoR!)

Besides being  fun, working together with other rationalists could be a good business move.

As a side-benefit, it also has good potential to raise the sanity waterline and help further develop new rationality skills, both personally and as a community.

Naturally, such a connection is not guaranteed to produce results. But it's hard to find the right people to work with: Why not try this route? And although you can cold-contact someone you've seen online, you don't know who's interested in what you have to offer, so I think more effort is needed to bootstrap such networking.

I'd like to gauge interest. (Alexandros has volunteered to help.) If you might be interested in this sort of networking, please fill out this short Google Form [Edit: Survey closed as of April 15]. I'll post an update about what sort of response we get.

Privacy: Although the main purpose of this form is to gauge interest, and other details may be needed to form good connections,  the info might be enough to get some contacts going. So, we might use this information to personally connect people. We won't share the info or build any online group with it. If we get a lot of interest we may later create some sort of online mechanism, but we’ll be sure to get your permission before adding you.

---------

Edit April 6: We're still seeing that people are  filling out the form, so we'll wait a week or two, and report on results.

---------

Edit April 15: See a summary of the results below

Snowdenizing UFAI

5 JoshuaFox 05 December 2013 02:42PM

Here is a suggestion for slowing down future secretive and unsafe UFAI projects.

Take the American defense and intelligence community as a case in point. They are a top candidate for the creation of Artificial General Intelligence (AGI): They can get the massive funding, and they can get some top (or near-top) brains on the job. The AGI will be unfriendly, unless friendliness is a primary goal from the start.

The American defense and intelligence community created the Manhattan Project, which is the canonical example for a giant, secret, leading-edge science-technology project with existential-risk implications.

David Chalmers (2010): "When I discussed [AI existential risk] with cadets and staff at the West Point Military Academy, the question arose as to whether the US military or other branches of the government might attempt to prevent the creation of AI or AI+, due to the risks of an intelligence explosion. The consensus was that they would not, as such prevention would only increase the chances that AI or AI+ would first be created by a foreign power."

Edward Snowden broke the intelligence community's norms by reporting what he saw to be tremendous ethical and legal violations. This requires an exceptionally well-developed personal sense of ethics (even if you disagree with those ethics). His actions have drawn a lot of support by those who share his values. Many who condemn him a traitor are still criticizing government intrusions in the basis of his revelations.

When the government AGI project starts rolling, will it have Snowdens who can warn internally about Unfriendly AI (UFAI) risks? They will probably be ignored and suppressed--that's how it goes in hierarchical bureaucratic organizations. Will these future Snowdens have the courage to keep fighting internally, and eventually to report the risks to the public or to their allies in the Friendly AI (FAI) research community

Naturally, the Snowden scenario is not limited to the US government. We can seek ethical dissidents, truthtellers, and whistleblowers in any large and powerful organization that does unsafe research, whether a government or a corporation.

Should we start preparing budding AGI researchers to think this way? We can do this by encouraging people to take consequentialist ethics seriously, which by itself can lead to Snowden-like results. and LessWrong is certainly working on that. But another approach is to start talking more directly about the "UFAI Whistleblower Pledge."

I hereby promise to fight unsafe AGI development in whatever way I can, through internal channels in my organization, by working with outside allies, or even by revealing the risks to the public.

If this concept becomes widespread, and all the more so if people sign on, the threat of ethical whistleblowing will hover over every unsafe AGI project. Even with all the oaths and threats they use to make new employees keep  secrets, the notion that speaking out on UFAI is deep in the consensus of serious AGI developers will cast a shadow on every project.

To be clear, the beneficial effect I am talking about here is not the leaks--it is the atmosphere of potential leaks, the lack of trust by management that researchers are completely committed to keeping any  secret.  For example, post Snowden, the intelligence agencies are requiring that sensitive files only be accessed by two people working together and they are probably tightening their approval guidelines and so rejecting otherwise suitable candidates. These changes make everything more cumbersome.

In creating the OpenCog project, Ben Goertzel advocated total openness as a way of accelerating the progress of those who are willing to expose any dangerous work they might be doing--even if this means that the safer researchers are giving their ideas to the unsafe, secretive ones.

On the other hand, Eliezer Yudkowsky has suggested that MIRI keep its AGI implementation ideas secret, to avoid handing them to an unsafe project. (See "Evaluating the Feasibility of SI's Plans," and, if you can stomach some argument from fictional evidence, "Three Worlds Collide.") Encouraging openness and leaks could endanger Eliezer's strategy. But if we follow Eliezer's position, a truly ethical consequentialist would understand that exposing unsafe projects is good, while exposing safer projects is bad.

So, what do you think? Should we start signing as many current and upcoming AGI researchers as possible to the UFAI Whistleblower Pledge, or work to make this an ethical norm in the community?

Why officers vs. enlisted?

13 JoshuaFox 30 October 2013 08:14PM

It's always puzzled me that, in armies, officers form a separate hierarchical ladder from the NCOs and enlisted soldiers.

Armies could have a single hierarchy, top to bottom, as in the simplified diagram below on the left. Instead, all armies have two distinct ladders, with one strictly above the other, as on the right.  (Reminds me of those wacky non-standard integers.)

 

                   

The usual answers are obvious but irrelevant: Yes, some people shoot straight to a position high on the ladder. You could do that with either model. Yes, even when those lower down on the ladder have more experience and wisdom, it can make practical sense to have a hierarchy. Yes, the higher someone is, the higher the level of the decisions they make. You could likewise do these on a one-ladder model.

It's said that officers "decide," while non-officers "just carry out  orders"; or that officers choose strategy, and non-officers do tactics. But everyone makes decisions, on their own level. A private makes decisions for himself, a corporal for three soldiers, and a colonel for a thousand, each one in the context of their orders from above. One soldier's strategy is his superior's tactics. And the distinction is not based on command: New army doctors automatically become officers, even if they don't command anyone. Doctors are non-combatants, but fighter pilots are combatants par excellence, don't command anyone, and are all officers.

These answers don't explain why there need to be two ladders. I asked at Quora without a convincing answer. Historically, the distinction was based on social classes, but that doesn't explain why every army follows this arrangement, including those in very different societies.

Similarly: What's a corporate executive? (I'm talking about large companies here; small companies and startups are different.) I understand that there is a management hierarchy, but why the arbitrary distinction between a senior manager and a junior executive? Aren't those just two rungs on the ladder? In corporate-speak, an executive is called a "decision maker." What a strange term! Isn't a manager or even a lowly "individual contributor" also a decision maker -- at the scope that their own managers allow? (I should add that the two-ladder system is not as developed in business as it is in the army  or in medicine. There is no career ladder for non-execs that extends arbitrarily high, though always below the execs.)

Not all professions work that way. Actuaries have ten levels, based on passing a sequence of exams. And though some areas of engineering distinguish an engineer from a technician, software engineering has no such dichotomy: Some software engineers make more money, and some make broader decisions or manage others, but there is no two-way split.

In medicine, on the other hand, there is a clear distinction between doctors and nurses. There are different status levels among doctors and among nurses, but a PhD in nursing stands on the other side of a clear border from a beginning MD. Similarly with lawyers and paralegals. These dichotomies stem from licensing restrictions, which in turn are descended from medieval guild practices. But why does it have to be this way? Why not just rank medical personnel, or legal personnel, in a single continuum from practical nurse through rockstar brain surgeon. (Is that a title?). There would still be the understanding that some people will never climb beyond a certain point, while others can jump straight to a higher rung.

The answer lies in LessWrong's concept of "agentiness": Making "choices so as to maximize the fulfillment of explicit desires, given explicit beliefs." Less abstractly, it is sometimes described as "reliability and responsibility." Agenty types get to be called "Player Characters" or heroes. ("Agenty" and "agentiness" are made-up words, and the standard terminology is "agent" and "agency.")  I think "agenty" was made up to point out that while all humans are agents to some extent, some do it far better than others.)

In the organizational context, officers and executives are meant to be agenty, while enlisted/NCO and non-executives are not. The officers and executives plan towards achieving goals, while everyone else executes defined tasks. The officers and executives make high-variance decisions, with high risks and high returns, while everyone else has the job of just doing their job consistently and not messing up.

Is agentiness a natural kind, a cluster in thingspace, a joint-carving concept? Might agentiness just be a mix of features that occur to varying degrees in various contexts?

We might say that agentiness is a continuum: Everyone has some, but some people have more than others. Lower-downs sometimes have goals, and higher-ups often act like cogs. Moreover, the agentiness of officers and executives is strictly in the context of their superiors' goals: They may be agenty, but not for their individual goals. It would be more accurate to say that in their roles they are meant to be agenty, on behalf of the organization.

Some people are non-agenty in some of their social roles and agenty in others. For example, I know workers who readily admit to being lowly cogs in a machine, but who have tremendous achievements in setting up and leading non-profits outside work hours. Some hard-driving workaholics are milquetoasts at home. Some caring, wise, foresightful parents are limp rags at work.

But agentiness is a real concept, at least so far as the officers and executives go. Their roles are implicitly defined by agentiness. Armies and corporations decide which people have it (or at least are meant to). These organizations agree with LessWrong that agentiness is a natural kind.

Bets on an Extreme Future

1 JoshuaFox 13 August 2013 08:05AM

Betting on the future is a good way to reveal true beliefs.

As one example of such a bet on a key debate about a post-human future, I'd like to announce here that Robin Hanson and I have made the following agreement. (See also Robin's post at Overcoming Bias):

We, Robin Hanson and Joshua Fox, agree to bet on which kind of artificial general intelligence (AGI) will dominate first, once some kind of AGI dominates humans. If the AGI are closely based on or derived from emulations of human brains, Robin wins, otherwise Joshua wins. To be precise, we focus on the first point in time when more computing power (gate-operations-per-second) is (routinely, typically) controlled relatively-directly by non-biological human-level-or-higher general intelligence than by ordinary biological humans. (Human brains have gate-operation equivalents.)

If at that time more of that computing power is controlled by emulation-based AGI, Joshua owes Robin whatever $3000 invested today in S&P500-like funds today is worth then. If more is controlled by AGI not closely based on emulations, Robin owes Joshua that amount. The bet is void if the terms of this bet make little sense then, such as if it becomes too hard to say if capable non-biological intelligence is general or human-level, if AGI is emulation-based, what devices contain computing power, or what devices control what other devices. But we intend to tolerate modest levels of ambiguity in such things.

[Added Aug. 17:] To judge if “AGI are closely based on or derived from emulations of human brains,” judge which end of the following spectrum is closer to the actual outcome. The two ends are 1) an emulation of the specific cell connections in a particular human brain, and 2) general algorithms of the sort that typically appear in AI journals today.

It's a bet on the old question: ems vs. de novo AGI. Kurzweil and Kapor bet on another well-known debate: Will machines pass the Turing Test. It would be interesting to list some other key debates that we could bet on. 

But it's hard to make a bet when settling the bet may occur in extreme conditions:

  • after human extinction,
  • in an extreme utopia,
  • in an extreme dystopia or,
  • after the bettors' minds have been manipulated in ways that redefine their personhood: copied thousands of times, merged with other minds, etc.

MIRI has a "techno-volatile" world-view: We're not just optimistic or pessimistic about the impact of technology on our future. Instead, we predict that technology will have an extreme impact, good or bad, on the future of humanity. In these extreme futures, the fundamental components of a bet--the bettors and the payment currency--may be missing or altered beyond recognition.

So, how can we calibrate our probability estimates about extreme events? One way is by betting on how people will bet in the future when they are closer to the events, on the assumption that they'll know better than we do. Though this is  an indirect and imperfect method, it might be the best we have for calibrating our beliefs about extreme futures.

For example, Robin Hanson has suggested a market on tickets to a survival shelter as a way of betting on an apocalypse. However, this only relevant for futures where shelters can help; and where there is time to get to one while the ticket holder is alive, and while the social norm of honoring tickets still applies.

We could also define bets on the progress of MIRI and similar organizations. Looking back on the years since 2005, when I started tracking this, I would have liked to bet on, or at least discuss, certain milestones before they happened. They served as (albeit weak) arguments from authority or from social proof for the validity of MIRI's ideas. Some examples of milestones that have already been reached:

  • SIAI's budget passing $500K per annum
  • SIAI getting 4 full-time-equivalent employees
  • SIAI publishing its fourth peer-reviewed paper
  • The establishment of a university research center in relevant fields
  • The first lecture on the core FAI thesis in an accredited university course
  • The first article on the core FAI thesis in a popular science magazine
  • The first mention of the core FAI thesis (or of SIAI as an organization) in various types of mainstream media, with a focus on the most prestigious (NPR for radio, New York Times for newspapers).
  • The first (indirect/direct) government funding for SIAI

Looking to the future, we can bet on some other FAI milestones. For example, we could bet on these coming true by a certain year.

  • FAI research in general (or: organization X) will have Y dollars in funding per annum (or: Z full-time researchers).
  • Eliezer Yudkowsky will still be working on FAI.
  • The intelligence explosion will be discussed on the floor of Congress (or: in some parliament; or: by a head of state somewhere in the world).
  • The first academic monograph on the core FAI thesis will be published (apparently that will be Nick Bostrom's).
  • The first master's thesis/PhD dissertation on the core FAI thesis will be completed.
  • "Bill Gates will read at least one of 'Our Final Invention' or 'Superintelligence' in the next 2 years" (This already appears on PredictionBazaar.)

(Some of these will need more refinement before we can bet on them.)

Another approach is to bet on technology trends: brain scanning resolution; prices for computing power; etc. But these bets are about a Kurzweillian Law of Accelerating Returns, which may be quite distinct from the Intelligence Explosion and other extreme futures we are interested in.

Many bets only make sense if you believe that a soft takeoff is likely. If you believe that, you could bet on AI events while still allowing the bettors a few years to enjoy their winnings. 

You can make a bet on hard vs. soft takeoff simply by setting your discount rate. If you're 20 years old and think that the economy as we know it will end instantly in, for example, 2040, then you won't save for your retirement. (See my article at H+Magazine.) But such decisions don't pin down your beliefs very precisely: Most people who don't save for their retirement are simply being improvident. Not saving makes sense if the human race is about to go extinct, but also if we are going to enter an extreme utopia or dystopia where your savings have no meaning. Likewise, most people save for retirement simply out of old-fashioned prudence, but you might build up your wealth in order to enjoy it pre-Singularity, or in order to take it with you to a post-Singularity world in which "old money" is still valuable.

I'd like to get your opinion: What are the best bets we can use for calibrating our beliefs about the extreme events we are interested in? Can you suggest some more of these indirect markers, or a different way of betting?

The Singularity Wars

52 JoshuaFox 14 February 2013 09:44AM

(This is a introduction, for  those not immersed in the Singularity world, into the history of and relationships between SU, SIAI [SI, MIRI], SS, LW, CSER, FHI, and CFAR. It also has some opinions, which are strictly my own.)

The good news is that there were no Singularity Wars. 

The Bay Area had a Singularity University and a Singularity Institute, each going in a very  different direction. You'd expect to see something like the People's Front of Judea and the Judean People's Front, burning each other's grain supplies as the Romans moved in. 

continue reading »

Evaluating the feasibility of SI's plan

25 JoshuaFox 10 January 2013 08:17AM

(With Kaj Sotala)

SI's current R&D plan seems to go as follows: 

1. Develop the perfect theory.
2. Implement this as a safe, working, Artificial General Intelligence -- and do so before anyone else builds an AGI.

The Singularity Institute is almost the only group working on friendliness theory (although with very few researchers). So, they have the lead on Friendliness. But there is no reason to think that they will be ahead of anyone else on the implementation.

The few AGI designs we can look at today, like OpenCog, are big, messy systems which intentionally attempt to exploit various cognitive dynamics that might combine in unexpected and unanticipated ways, and which have various human-like drives rather than the sort of supergoal-driven, utility-maximizing goal hierarchies that Eliezer talks about, or which a mathematical abstraction like AIXI employs.

A team which is ready to adopt a variety of imperfect heuristic techniques will have a decisive lead on approaches based on pure theory. Without the constraint of safety, one of them will beat SI in the race to AGI. SI cannot ignore this. Real-world, imperfect, safety measures for real-world, imperfect AGIs are needed.  These may involve mechanisms for ensuring that we can avoid undesirable dynamics in heuristic systems,  or AI-boxing toolkits usable in the pre-explosion stage, or something else entirely. 

SI’s hoped-for theory will include a reflexively consistent decision theory, something like a greatly refined Timeless Decision Theory.  It will also describe human value as formally as possible, or at least describe a way to pin it down precisely, something like an improved Coherent Extrapolated Volition.

The hoped-for theory is intended to  provide not only safety features, but also a description of the implementation, as some sort of ideal Bayesian mechanism, a theoretically perfect intelligence.

SIers have said to me that SI's design will have a decisive implementation advantage. The idea is that because strap-on safety can’t work, Friendliness research necessarily involves more fundamental architectural design decisions, which also happen to be general AGI design decisions that some other AGI builder could grab and save themselves a lot of effort. The assumption seems to be that all other designs are based on hopelessly misguided design principles. SI-ers, the idea seems to go, are so smart that they'll  build AGI far before anyone else. Others will succeed only when hardware capabilities allow crude near-brute-force methods to work.

Yet even if the Friendliness theory provides the basis for intelligence, the nitty-gritty of SI’s implementation will still be far away, and will involve real-world heuristics and other compromises.

We can compare SI’s future AI design to AIXI, another mathematically perfect AI formalism (though it has some critical reflexivity issues). Schmidhuber, Hutter, and colleagues think that their AXI can be scaled down into a feasible implementation, and have implemented some toy systems. Similarly, any actual AGI based on SI's future theories will have to stray far from its mathematically perfected origins.

Moreover, SI's future friendliness proof may simply be wrong. Eliezer writes a lot about logical uncertainty, the idea that you must treat even purely mathematical ideas with same probabilistic techniques as any ordinary uncertain belief. He pursues this mostly so that his AI can reason about itself, but the same principle applies to Friendliness proofs as well.

Perhaps Eliezer thinks that a heuristic AGI is absolutely doomed to failure; that a hard takeoff  immediately soon after the creation of the first AGI is so overwhelmingly likely that a mathematically designed AGI is the only one that could stay Friendly. In that case, we have to work on a pure-theory approach, even if it has a low chance of being finished first. Otherwise we'll be dead anyway. If an embryonic AGI will necessarily undergo an intelligence explosion, we have no choice but to "shut up and do the impossible."

I am all in favor of gung-ho knife-between-the teeth projects. But when you think that your strategy is impossible, then you should also look for a strategy which is possible, if only as a fallback. Thinking about safety theory until drops of blood appear on your forehead (as Eliezer puts it, quoting Gene Fowler), is all well and good. But if there is only a 10% chance of achieving 100% safety (not that there really is any such thing), then I'd rather go for a strategy that provides only a 40% promise of safety, but with a 40% chance of achieving it. OpenCog and the like are going to be developed regardless, and probably before SI's own provably friendly AGI. So, even an imperfect safety measure is better than nothing.

If heuristic approaches have a 99% chance of an immediate unfriendly explosion, then that might be wrong. But SI, better than anyone, should know that any intuition-based probability estimate of “99%” really means “70%”. Even if other approaches are long-shots, we should not put all our eggs in one basket. Theoretical perfection and stopgap safety measures can be developed in parallel.

Given what we know about human overconfidence and the general reliability of predictions, the actual outcome will to a large extent be something that none of us ever expected or could have predicted. No matter what happens, progress on safety mechanisms for heuristic AGI will improve our chances if something entirely unexpected happens.

What impossible thing should SI be shutting up and doing? For Eliezer, it’s Friendliness theory. To him, safety for heuristic AGI is impossible, and we shouldn't direct our efforts in that direction. But why shouldn't safety for heuristic AGI be another impossible thing to do?

(Two impossible things before breakfast … and maybe a few more? Eliezer seems to be rebuilding logic, set theory, ontology, epistemology, axiology, decision theory, and more, mostly from scratch. That's a lot of impossibles.)

And even if safety for heuristic AGIs is really impossible for us to figure out now, there is some chance of an extended soft takeoff that will allow for the possibility of us developing heuristic AGIs which will help in figuring out AGI safety, whether because we can use them for our tests, or because they can by applying their embryonic general intelligence to the problem. Goertzel and Pitt have urged this approach.

Yet resources are limited. Perhaps the folks who are actually building their own heuristic AGIs are in a better position than SI to develop safety mechanisms for them, while SI is the only organization which is really working on a formal theory on Friendliness, and so should concentrate on that. It could be better to focus SI's resources on areas in which it has a relative advantage, or which have a greater expected impact.

Even if so, SI should evangelize AGI safety to other researchers, not only as a general principle, but also by offering theoretical insights that may help them as they work on their own safety mechanisms.

In summary:

1. AGI development which is unconstrained by a friendliness requirement is likely to beat a provably-friendly design in a race to implementation, and some effort should be expended on dealing with this scenario.

2. Pursuing a provably-friendly AGI, even if very unlikely to succeed, could still be the right thing to do if it was certain that we’ll have a hard takeoff very soon after the creation of the first AGIs. However, we do not know whether or not this is true.

3. Even the provably friendly design will face real-world compromises and errors in its  implementation, so the implementation will not itself be provably friendly. Thus, safety protections of the sort needed for heuristic design are needed even for a theoretically Friendly design.

View more: Prev | Next