Why "new terms"? If the language can finitely express a concept, my scheme gives that concept plausibility. Maybe this could be extended to lengths of programs that generate axioms for a given theory (even enumerable sets of axioms), rather than lengths of individual finite statements, but I guess that can be stated within some logical language just as well.

## FAI Research Constraints and AGI Side Effects

**Ozzie Gooen and Justin Shovelain**

## Summary

Friendly artificial intelligence (FAI) researchers have at least two significant challenges. First, they must produce a significant amount of FAI research in a short amount of time. Second, they must do so without producing enough general artificial intelligence (AGI) research to result in the creation of an unfriendly artificial intelligence (UFAI). We estimate the requirements of both of these challenges using two simple models.

Our first model describes a *friendliness ratio* and a *leakage ratio* for *FAI research projects*. These provide limits on the allowable amount of artificial general intelligence (AGI) knowledge produced per unit of FAI knowledge in order for a project to be net beneficial.

Our second model studies a hypothetical *FAI venture*, which is responsible for ensuring FAI creation. We estimate necessary total FAI research per year from the venture and leakage ratio of that research. This model demonstrates a trade off between the speed of FAI research and the proportion of AGI research that can be revealed as part of it. If FAI research takes too long, then the acceptable leakage ratio may become so low that it would become nearly impossible to safely produce any new research.

By new "term" I meant to make the clear that this statement points to an operation that cannot be done with the original machine. Instead it calls this new module (say a halting oracle) that didn't exist originally.

I haven't studied algorithmic probability literature in-depth, but it naively seems to me that one can straightforwardly extend the idea of universal probability to arbitrary logical languages, thus becoming able to assign plausibility to all mathematical structures. The same principle as with universal prior, but have a *statement* valued by the length of the shortest equivalent statement (from no non-logical axioms), and consequently a class of structures gets value from a statement describing it. This takes care of not noticing halting oracles and so on, you just need to let go of the standard theory/model of programs-only.

Are you trying to express the idea of adding new fundamental "terms" to your language describing things like halting oracles and such? And then discounting their weight by the shortest statement of said term's properties expressed in the language that existed previously to including this additional "term?" If so, I agree that this is the natural way to extend priors out to handle arbitrary describable objects such as halting oracles.

Stated another way. You start with a language L. Let the definition of an esoteric mathematical object (say a halting oracle) E be D in the original language L. Then the prior probability of a program using that object is discounted by the description length of D. This gives us a prior over all "programs" containing arbitrary (describable) esoteric mathematical objects in their description.

I'm not yet sure how universal this approach is at allowing arbitrary esoteric mathematical objects (appealing to the Church-Turing thesis here would be assuming the conclusion) and am uncertain whether we can ignore the ones it cannot incorporate.

Interesting idea.

I agree that trusting newly formed ideas is risky, but there are several reasons to convey them anyway (non-comprehensive listing):

To recruit assistance in developing and verifying them

To convey an idea that is obvious in retrospect, an idea you can be confident in immediately

To signal cleverness and ability to think on one's feet

To socially play with the ideas

What we are really after though is to asses how much weight to assign to an idea off the bat so we can calculate the opportunity costs of thinking about the idea in greater detail and asking for the idea to be fleshed out and conveyed fully. This overlaps somewhat with the confidence (context sensitive rules in determining) with which the speaker is conveying the idea. Also, how do you gauge how old an idea really is? Especially if it condenses gradually or is a simple combination out of very old parts? Still... some metric is better than no metric.

<Thought about for 1 minute. Written up in 5 minutes.>

## Sequential Organization of Thinking: "Six Thinking Hats"

Many people move chaotically from thought to thought without explicit structure. Inappropriate structuring may leave blind spots or cause the gears of thought to grind to a halt, but the advantages of appropriate structuring are immense:

Correct thought structuring ensures that you examine all relevant facets of an issue, idea, or fact.

- It ensures you know what to do next at every stage and are not frustrated or crippled by akrasia between moments of choice; the next action is always obvious.
- It minimizes the overhead of task switching: you are in control and do not dither between possibilities.
- It may be used in a social context so that potentially challenging issues and thoughts may be brought up in a non-threatening manner (let's look at the positive aspects, now let's focus purely on the negative...).

To illustrate thought structuring, I use the example of Edward de Bono's "six thinking hats" mnemonic. With Edward de Bono's "six thinking hats" method you metaphorically put on various colored "hats" (perspectives) and switch "hats" depending on the task. I will use the somewhat controversial issue of cryonics as my running example.^{1}

Poll: Do you have older siblings or are an only child?

Vote this up if you are the oldest child with siblings.

Poll: Do you have older siblings or are an only child?

Vote this up if you are an only child.

Poll: Do you have older siblings or are an only child?

Vote this up if you have older siblings.

View more: Next

*0 points [-]