Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: JoshuaFox 29 March 2017 05:35:11PM 1 point [-]

Eliezer is still writing AI Alignment content on it, ... MIRI ... adopt Arbital ...

How does Eliezer's work on Arbital relate to MIRI? Little is publicly visible of what is is doing in MIRI. Is he focusing on Arbital? What is the strategic purpose?

Comment author: luminosity 15 January 2017 12:42:54PM 0 points [-]

How did you go about finding people to come to your meetups? Pre-existing friends? LW site postings? Something else?

Comment author: JoshuaFox 19 January 2017 09:52:55AM 1 point [-]

Pre-existing friends, postings on Facebook (even though FB does not distribute events to the timelines of group members if there are more than 250 people in a group), occasionally lesswrong.com (not event postings, but more that people who are actively interested LW seek out a Tel Aviv group)

In response to Meetup Discussion
Comment author: JoshuaFox 11 January 2017 07:42:15AM *  1 point [-]

In Tel Aviv, we have three types of meetings, all on Tuesdays. Monthly we have a full meeting, usually a lecture or sometimes Rump Sessions (informal lightning talks). Typical attendance is 12.

Monthly, alternating fortnights from the above, we do game nights.

We are graciously hosted by Meni Rosenfeld's Cluster startup hub. (For a few years we were hosted at Google.)

On other Tuesdays a few LessWrongers get together at a pub.

Comment author: JoshuaFox 04 January 2017 09:06:54AM 0 points [-]

There certainly should be more orgs with different approaches. But possibly, CHCAI plays a role as the representative of MIRI in the mainstream academic world, and so from the perspective of goals, it is OK that the two are quite close.

In response to Gatekeeper variation
Comment author: kilobug 10 August 2015 01:41:27PM 1 point [-]

This wont work, like with all other similar schemes, because you can't "prove" the gatekeeper down to the quark level of what makes its hardware (so you're vulnerable to some kind of side-attack, like the memory bit flipping attack that was spoken about recently), nor shield the AI from being able to communicate through side channels (like, varying the temperature of its internal processing unit which it turns will influence the air conditioning system, ...).

And that's not even considering that the AI could actually discover new physics (new particles, ...) and have some ability to manipulate them with its own hardware.

This whole class of approach can't work, because there are just too many ways for side-attacks and side-channels of communication, and you can't formally prove none of them are available, without going down to making proof over the whole (AI + gatekeeper + power generator + air conditioner + ...) down at Schrödinger equation level.

Comment author: JoshuaFox 10 August 2015 01:58:48PM 0 points [-]

You're quite right--these are among the standard objections for boxing, as mentioned in the post. However, AI boxing may have value as a stopgap in an early stage, so I'm wondering about the idea's value in that context.

Comment author: Luke_A_Somers 08 August 2015 03:28:39AM 1 point [-]

Seems like your usual 'independently verify everything the AI says' concept, only way more restrictive.

Comment author: JoshuaFox 08 August 2015 06:21:09PM 0 points [-]

Sure, but to "independently verify" the output of an entity smarter than you is generally impossible. This makes it possible, while also limiting the potential of the boxed AI to choose its answers.

In response to Gatekeeper variation
Comment author: Luke_A_Somers 07 August 2015 01:50:22PM 4 points [-]

Objection 1 seems really strong. The kinds of problems that AGI would be better at than non-general-intelligences are those with ambiguity. If it was just a constraint-solver, it wouldn't be a threat in the first place.

Similarly, with such a restricted output channel, there's little-to-no point in making it have agency to begin with. We're deep in 'tool AI' territory. The incentives to leave this territory would remain.

Comment author: JoshuaFox 07 August 2015 03:00:18PM 0 points [-]

Thanks. Those points are correct. Is there any particular weakness or strength to this UP-idea in contrast to Oracle, tool-AI, or Gatekeeper ideas?

Comment author: Kaj_Sotala 29 June 2015 06:18:18PM 7 points [-]

The only one I can think of (though I can't find the specific article) is Goertzel's description of an architecture where the guardian component is separate from the main AGI

You're probably thinking of GOLEM. The Gödel machine is another proposal along somewhat similar lines.

Some discussions more directly related to your suggestion could be:

Our proposed protocol is based on the idea of asking what we will call ‘safe questions’ with restricted answers. We define a safe question as one for which a human being could find an answer without any help from superintelligence, all on his own merit. Why is this useful? Suppose that a scientist working on a cure for cancer is looking at two potential cures. Each one could be developed and tested without assistance from the AI, but would take about 3 years of hard work each. Which one to try first if the scientist believes that they have about an equal chance of working? If a wrong option is chosen to be tested first, humanity will still obtain a cure for cancer but it will happen 3 years later. What if we could ask the AI to suggest which option to try first? [...]

To make sure that a potential question is safe, a group of experts could be used to review it. The experts should all be trained AI safety engineers, meaning that they are familiar with the design of the AI and its confinement environment as well as the latest developments in machine ethics (Allen et al., 2006; Anderson & Anderson, 2007; Hall, 2000; Moor, 2006; Tonkens, 2009). Experts may also need to be trained in computer psychology a currently non-existent profession which might become a reality in the future (Epstein, 1997). [...]

The limited outbound channel shell consist of binary (0 or 1) answers next to original unmodified safe questions presented by the intermediary non-intelligent software in random order as to avoid communication via covert timing or resource channels which make steganography possible. The output should in no way be controllable by the AI, outside of the multiple choice menu.

  • "Thinking inside the box" (section 3.1.) also has some discussion about restricting an AGI's outputs to be safe by selecting the questions to be asked appropriately. There was also some discussion along these lines in the beginning of chap. 10 of Superintelligence, e.g., running multiple copies of different Oracles and comparing their answers with each other to avoid deception - "the answers of the different oracles have to match each other" could be a simple and formally definable criteria.
Comment author: JoshuaFox 30 June 2015 04:49:03AM *  4 points [-]

Thank you, Kaj. Those references are what I was looking for.

It looks like there might be a somewhat new idea here. Previous suggestions, as you mention, restrict output to a single bit; or require review by human experts. Using multiple AGI oracles to check each other is a good one, though I'd worry about acausal coordination between by the AGIs, and I don't see that the safety is provable beyond checking that answers match.

This new variant gives the benefit of provable restrictions and the relative ease of implementing a narrow-AI proof system to check it. It's certainly not the full solution to the FAI problem, but it's a good addition to our lineup of partial or short-term solutions in the area of AI Boxing and Oracle AI.

I'll get this feedback to the originator of this idea and see what can be made of it.

Comment author: JoshuaFox 29 June 2015 05:16:03AM *  2 points [-]

Could someone point me to any existing articles on this variant of AI-Boxing and Oracle AGIs:

The boxed AGI's gatekeeper is a simpler system which runs formal proofs to verify that AGI's output satisfies a simple, formally definable. The constraint is not "safety" in general but rather is narrow enough that we can be mathematically sure that the output is safe. (This does limit potential benefits from the AGI.)

The questions about what the constraint should be remains open, and of course the fact that the AGI is physically embodied puts it in causal contact with the rest of the universe. But as a partial or short-term solution, has anyone written about it? The only one I can think of (though I can't find the specific article) is Goertzel's description of an architecture where the guardian component is separate from the main AGI.

Comment author: JoshuaFox 22 June 2015 12:31:41PM *  0 points [-]

Duplicate of Anatoly's post.

View more: Next