Steve_Rayhawk07 March 2012 07:22:58PM* 7 points [-]

That said, I think his fear of culpability (for being potentially passively involved in an existential catastrophe) is very real. I suspect he is continually driven, at a level beneath what anyone's remonstrations could easily affect, to try anything that might somehow succeed in removing all the culpability from him. This would be a double negative form of "something to protect": "something to not be culpable for failure to protect".

If this is true, then if you try to make him feel culpability for his communication acts as usual, this will only make his fear stronger and make him more desperate to find a way out, and make him even more willing to break normal conversational rules.

I don't think he has full introspective access to his decision calculus for how he should let his drive affect his communication practices or the resulting level of discourse. So his above explanations for why he argues the way he does are probably partly confabulated, to match an underlying constraining intuition of "whatever I did, it was less indefensible than the alternative".

(I feel like there has to be some kind of third alternative I'm missing here, that would derail the ongoing damage from this sort of desperate effort by him to compel someone or something to magically generate a way out for him. I think the underlying phenomenon is worth developing some insight into. Alex wouldn't be the only person with some amount of this kind of psychology going on -- just the most visible.)

Steve_Rayhawk04 March 2012 11:44:05AM9 points [-]

Currently you suspect that there are people, such as yourself, who have some chance of correctly judging whether arguments such as yours are correct, and of attempting to implement the implications if those arguments are correct, and of not implementing the implications if those arguments are not correct.

Do you think it would be possible to design an intelligence which could do this more reliably?

Steve_Rayhawk04 March 2012 09:49:51AM* 3 points [-]

I wish there was a more standard term for this than "kinesthetic thinking", that other people would be able to look up and understand what was meant.

(A related term is "motor cognition", but that doesn't denote a thinking style. Motor cognition is a theoretical paradigm in cognitive psychology, according to which most cognition is a kind of higher-order motor control/planning activity, connected in a continuous hierarchy with conventional concrete motor control and based on the same method of neural implementation. (See also: precuneus (reflective cognition?); compare perceptual control theory.) Another problem with the term "motor cognition" is that it doesn't convey the important nuance of "higher-order motor planning except without necessarily any concurrent processing of any represented concrete motions". (And the other would-be closest option, "kinesthetic learning", actively denotes the opposite.)

Plausibly, people could be trained to introspectively attend to the aspect of cognition which was like motor planning with a combination of TCMS, to inhibit visual and auditory imagery, and cognitive tasks which involved salient constraints and tradeoffs. Maybe the cognitive tasks would also need to have specific positive or negative consequences for apparent execution of recognizable scripts of sequential actions typical of normally learned plans for the task. Some natural tasks, which are not intrinsically verbal or visual, with some of these features would be social reasoning, mathematical proof planning, or software engineering.)

when I am thinking kinesthetically I basically never rationalize as such

I think kinesthetic thinking still has things like rationalization. For example, you might have to commit to regarding a certain planned action a certain way as part of a complex motivational gambit, with the side effect that you commit to pretend that the action will have some other expected value than the one you would normally assign. If this ability to make commitments that affect perceived expected value can be used well, then by default this ability is probably also being used badly.

Could you give more details about the things like rationalization that you were thinking of, and what it feels like deciding not to do them in kinesthetic thinking?

Steve_Rayhawk19 January 2012 07:27:49AM* 4 points [-]

If the human-level AGI

0) is autonomous (has, or forms, long-term goals)
1) is not socialized

 

#1 is important because a self-modifying system will tend to respond to negative reinforcement concerning sociopathic behaviors resulting from #3-- though, it must be admitted, this will depend on how deeply the ability to self-modify runs. Not all architectures will be capable of effectively modifying their goals in response to social pressures. (In fact, rigid goal-structure under self-modification will usually be seen as an important design-point.)

Abram: Could you make this more precise?

From the way you used the concept of "negative reinforcement", it sounds like you have a particular family of agent architectures in mind, which is constrained enough that we can predict that the agent will make human-like generalizations about a reward-relevant boundary between "sociopathic" behaviors and "socialized" ones. It also sounds like you have a particular class of possible emergent socially-enforced concepts of "sociopathic" in mind, which is constrained enough that we can predict that the behaviors permitted by that sociopathy concept wouldn't still be an existential catastrophe from our point of view. But you haven't said enough to narrow things down that much.

For example, you could have two initially sociopathic agents, who successfully manipulate each other into committing all their joint resources to a single composite agent having a weighted combination of their previous utility functions. The parts of this combined agent would be completely trustworthy to each other, so that the agents could be said to have modified their goals in response to the "social pressures" of their two-member society. But the overall agent could still be perfectly sociopathic toward other agents who were not powerful enough to manipulate it into making similar concessions.

Steve_Rayhawk10 January 2012 01:22:11PM* 1 point [-]

In fact, I'd prefer it if Q8 started out with the less-shibbolethy "How much have you read about, or used the concepts of..." or something like that, which replaces a dichotomy with a continuum.

Yeah... I wanted to make the suggested question less loaded, but it would have required more words, and I was unthinkingly preoccupied with worry about a limit on the permitted complexity of a single-sentence question. Maybe I should have split the question across more sentences.

The signaling uses of Q8 seem like a bad idea to me, although it seems a worthwhile thing to ask for Steve Rayhawk's reasons.

My reasons for suggesting Q8 were mostly:

  • First, I wanted to make it easier to narrow down hypotheses about the relationship between respondents' opinions about AI risk and their awareness of progress toward formal, machine-representable concepts of optimal AI design (also including, I guess, progress toward practically efficient mechanized application of those concepts, as in Schmidhuber's Speed Prior and AIXI-tl).

  • Second, I was imagining that many respondents would be AI practitioners who thought mostly in terms of architectures with a machine-learning flavor. Those architectures usually have a very specific and limited structure in their hypothesis space or policy space by construction, such that it would be clearly silly to imagine a system with such an architecture self-representing or self-improving. These researchers might have a conceptual myopia by which they imagine "progress in AI" to mean only "creation of more refined machine-learning-style architectures", of a sort which of course wouldn't lead towards passing a threshold of capability of self-improvement anytime soon. I wanted to put in something of a conceptual speed bump to that kind of thinking, to reduce unthinking dismissiveness in the answers, and counter part of the polarizing/consistency effects that merely receiving and thinking about answering the survey might have on the recipients' opinions. (Of course, if this had been a survey which were meant to be scientific and formally reportable, it would be desirable for the presence of such a potentially leading question to be an experimentally controlled variable.)

With those reasons on the table, someone else might be able to come up with a question that fulfills them better. I also agree with paulfchristiano's comment.

Steve_Rayhawk09 January 2012 06:32:48AM* 1 point [-]

However, it's harder to find uncontroversial questions which would be diagnostic of these errors.

Perhaps an expert's beliefs about the costs of better information and the costs of delay might be assessed with a willingness-to-pay question, such as a tradeoff involving a hypothetical benefit to everyone now living on Earth which could be sacrificed to gain hypothetical perfect understanding of some technical unknowns related to AI risks, or a hypothetical benefit gained at the cost of perfect future helplessness against AI risks. However, even this sort of question might seem to frame things hyperbolically.

Steve_Rayhawk09 January 2012 06:19:57AM* 12 points [-]

I think experts' opinions on the possibility of AI self-improvement may covary with their awareness of work on formal, machine-representable concepts of optimal AI design, particularly Solomonoff induction, including its application to reinforcement learning as in AIXI, and variations of Levin search such as Hutter's algorithm M and Gödel machines. If an expert is unaware of those concepts, this unawareness may serve to explain away the expert's belief that there are no approaches to engineering self-improvement-capable AI on any foreseeable horizon.

If it's not too late, you should probably include a question to judge the expert's awareness of these concepts in your questionnaires, such as:

"Qn: Are you familiar with formal concepts of optimal AI design which relate to searches over complete spaces of computable hypotheses or computational strategies, such as Solomonoff induction, Levin search, Hutter's algorithm M, AIXI, or Gödel machines?"

...bearing in mind that the presence of such a question may affect their other answers.

(This was part of what I was getting at with my analysis of the AAAI panel interim report: "What cached models of the planning abilities of future machine intelligences did the academics have available [...]?" "What fraction of the academics are aware of any current published AI architectures which could reliably reason over plans at the level of abstraction of 'implement a proxy intelligence'?")

Other errors which might explain away an expert's unconcern for AI risk are:

  • incautious thinking about the full implications of a given optimization criterion or motivational system;

  • when considering AI self-improvement scenarios, incautious thinking about parameter uncertainty and structural uncertainty in economic descriptions of computational complexity costs and efficiency gains over time (particularly given that a general AI will be motivated to investigate many different possible structures for the process for self-improvement, including structures one may not oneself have considered, in order to choose a process whose economics are as favorable as possible); and

  • incomplete reasoning about options for gathering information about technical factors affecting AI risk scenarios, when considering the potential relative costs of delaying AI safety projects until better information is available (on the implicit expectation that, in the event that the technical factors turn out to imply safety, delaying will have prevented the cost of the AI safety projects, and (more viscerally) that having advocated delay will prevent one's own loss of prestige, unthinkingly taken as a proxy for correctness, whereas failure to have advocated an immediate start to AI safety projects could not result in loss of one's own prestige in any event).

However, it's harder to find uncontroversial questions which would be diagnostic of these errors.

Steve_Rayhawk30 December 2011 05:12:30AM2 points [-]

I'll stake $500 if eligible.

When would the answer need to be known by?

Steve_Rayhawk10 December 2011 07:17:16AM0 points [-]

[...] assume that the universe and its initial conditions can be described succinctly and inferred by A, and that the sequence of bits sent over W1 and W2 can be defined using an additional 10000 bits once a description of the universe is in hand.

Do you mean, "once a full description of the universe is in hand", and that the 10000 bits are the complexity of locating W1 and W2 in the full description?

  • A's outputs are fed to the output wire W2, the rest of the universe (including A itself) behaves according to physical law, and A is given the values from input wire W1 as its input. (Model 1)
  • A's outputs are ignored, the rest of the universe behaves according to physical law, and A is given the values from W1 as its input. (Model 2)

Model 2 still has to locate W1. What might be the complexity of locating W2 conditional on having located W1? It seems plausible that this extra complexity would be within a constant of the complexity of describing the AIXI algorithm and something like counterfactual physical reasoning about alternative inputs, to check whether the known behavior of AIXI is consistent with W2 given counterfactual input at W1.

In response to comment by brilee on Log-odds (or logits)
Steve_Rayhawk30 November 2011 06:43:51PM* 5 points [-]

The natural unit of ratio, the neper (Np), is easier to interpret for small ratio contributions, where the derivative of exp(x) is ≈1:

0.1Np = exp( 0.1) ∶ 1 ≈ 1.1 ∶ 1
-0.1Np = exp(-0.1) ∶ 1 ≈ 0.9 ∶ 1

This could make for an easy upgrade path to use of nepers or centinepers instead of percents in comparatives involving rates, which would reduce semantic confusion. "50% faster" can mean "gets 150% as far" (so .41Np faster, or 41 cNp, or perhaps 41Np%) or "takes 50% as much time" (so .69Np faster, or 69cNp, or 69Np%). That's an argument for using nepers as a standard base outside communications of probability.

(trivia: Nepers and radians are each other turned sideways, being respectively the real and imaginary parts of eigenvalues of linear differential equation systems.)

View more: Next