LESSWRONG
LW

All of jessicat's Comments + Replies

What is up with carbon dioxide and cognition? An offer

What monitor is that?

This one. It doesn't log data.

0gwern9y

I see. But they do have a data-logging version which is only another $40. (Although at that price, since it's only recording CO2, humidity, and temperature according to the data sheet, one might want to buy one of the other air quality sensors I listed in the QS thread; but on the gripping hand, those have persistent controversies about how accurate they really are, and you would hope that a C02 specialty site like this would be selling more accurate sensors.)

The AI That Pretends To Be Human

jessicat9y60

Here's a scenario that doesn't seem completely implausible. Suppose Bob is someone whose public key is easily available on the internet. The first AI will read things on the internet and output a message. Some of the message will get put on the public internet. Bob suspects that the AI might have secretly sent him a message (e.g. giving him advice on which stocks to buy). So he tries using his private key to decrypt some of the AI's output (e.g. the lowest-order bits in some images the AI has output).

Knowing that Bob (or someone else like Bob) will li... (read more)

-2Houshalter9y

I'm not saying the situation is impossible, just really really unlikely. The AI would need to output big binary files like images, and know someone intended to decode them, and somehow get around statistical detection by AI 2 (stenography is detectable since the lowest order bits of an image are not uniformly random.) You might have a point that it's probably not best to publish things produced by the AI on the internet. If this is a serious risk, then it could still be done safely with a small group.

The AI That Pretends To Be Human

jessicat9y30

The trouble is that it's much easier to create a steganographic message (e.g. encrypting a message using a particular public key) than to detect it (which requires knowing the private key or otherwise breaking the encryption). So in this case "much more computing power" has to mean "exponentially more computing power".

0Houshalter9y

There's no reasonable situation in which the human would be communicating with the AI through encryption. And just the AI asking to use encryption, or giving away a public key, would be a huge red flag that it is an AI trying to deceive.

The AI That Pretends To Be Human

jessicat9y50

You might be interested in reading:

https://medium.com/ai-control/mimicry-maximization-and-meeting-halfway-c149dd23fc17#.v6e533hkf

https://medium.com/ai-control/elaborations-of-apprenticeship-learning-eb93a53ae3ca#.5ubczdqf0

https://intelligence.org/files/QuantilizersSaferAlternative.pdf

This prevents the first AI from doing evil things with it's output. If it tries to insert complicated infohazards or subagents into it's output stream, it will be easily detected as an AI. Instead it needs to mimic humans as closely as possible.

Note that steganography is s... (read more)

0Houshalter9y

I noted that AI 2 should have much more computing power than AI 1. It should be smarter and therefore be able to detect whatever tricks AI 1 can produce. Assuming something like that is even possible, which isn't self-evident to me.

Steelmaning AI risk critiques

jessicat10y170

One of the most common objection's I've seen is that we're too far from getting AGI to know what AGI will be like, so we can't productively work on the problem without making a lot of conjunctive assumptions -- e.g. see this post.

FAI Research Constraints and AGI Side Effects

jessicat10y00

And before I could scribble a damned thing, Calude went and solved it six months ago. The Halting Problem, I mean.

Cool. If I get the meaning of the result well, it's that if you run a random program for some number of steps and it doesn't halt, then (depending on the exact numbers) it will be unlikely to halt when run on a supercomputer either, because halting times have low density. So almost all programs halt quickly or run a really really long time. Is this correct? This doesn't quite let you approximate Chaitin's omega, but it's interesting tha... (read more)

0[anonymous]10y

Yep. Or, to put it a little tiny bit more accurately, you get a halting probability for your particular Turing machine, conditioned on the number of steps for which you've run it. Well technically, you can approximate Chaitin's omega from below just as Chaitin himself describes in his book. You'll just only be able to calculate finitely many digits. Which we do ;-). You could run until you get past a desired threshold of probability (hypothesis testing), or you could use a bounded-rationality approach to vary your surety of nonhalting with your available processing power. But overall, it gives you a way to "reason around" the Halting Problem, which, when we apply it to the various paradoxes of self-reference... you can see where I'm going with this.

FAI Research Constraints and AGI Side Effects

jessicat10y00

It's not a specific programming language, I guess it's meant to look like Church. It could be written as:

(query
. (define a (p))
. (foreach (range n) (lambda i)
. . (define x (x-prior))
. . (factor (log (U x a)))))

Well so does the sigmoided version

It samples an action proportional to p(a) E[sigmoid(U) | a]. This can't be written as a function of E[U | a].

FAI Research Constraints and AGI Side Effects

jessicat10y20

Due to the planning model, the successor always has some nonzero probability of not pressing the button, so (depending on how much you value pressing it later) it'll be worth it to press it at some point.

FAI Research Constraints and AGI Side Effects

jessicat10y00

When you use e-raised-to-the alpha times expectation, is that similar to the use of an exponential distribution in something like Adaboost, to take something like odds information and form a distribution over assorted weights?

I'm not really that familiar with adaboost. The planning model is just reflecting the fact that bounded agents don't always take the maximum expected utility action. The higher alpha is, the more bias there is towards good actions, but the more potentially expensive the computation is (e.g. if you use rejection sampling).

Since

... (read more)

0[anonymous]10y

Ah, that makes sense! And before I could scribble a damned thing, Calude went and solved it six months ago. The Halting Problem, I mean. I wonder how he feels about that, because my current feeling about this is HOLY FUCKING SHIT. By GOD, my AIT textbook cannot get here fast enough.

FAI Research Constraints and AGI Side Effects

jessicat10y30

Your model selects an action proportional to p(a) E[sigmoid(U) | a], whereas mine selects an action proportional to p(a) e^E[U | a]. I think the second is better, because it actually treats actions the same if they have the same expected utility. The sigmoid version will not take very high utilities or very low utilities into account much.

Btw it's also possible to select an action proportional to E[U | a]^n:

query {
. a ~ p()
. for i = 1 to n
. . x_i ~ P(x)
. . factor(log U(x, a))
}

0[anonymous]10y

Could you explain your syntax here? What probabilistic programming language are you using? Well so does the sigmoided version, but you are right that the sigmoid version won't take very high or very low utilities into account. It's meant to shoehorn unbounded utility functions into a framework where one normally works only with random variables.

FAI Research Constraints and AGI Side Effects

jessicat10y30

(BTW: here's a writeup of one of my ideas for writing planning queries that you might be interested in)

Often we want a model where the probability of taking action a is proportional to p(a)e^E[U(x, a)], where p is the prior over actions, x consists of some latent variables, and U is the utility function. The straightforward way of doing this fails:

query {
      . a ~ p()
      . x ~ P(x)
      . factor(U(x, a))
}

Note that I'm assuming factor takes a log probability as its argument. This fails due to "wishful thinking": it tends to prefer ris... (read more)

0[anonymous]10y

Interesting! How does that compare to the usual implementations of planning as probabilistic inference, as exemplified below? (query . (define a (prior)) . (define x (lambda (a) (world a))) . (define r (flip (sigmoid (U (x a))))) . a . (= r #t))

FAI Research Constraints and AGI Side Effects

jessicat10y00

ZOMFG, can you link to a write-up? This links up almost perfectly with a bit of research I've been wanting to do.

Well, a write-up doesn't exist because I haven't actually done the math yet :)

But the idea is about algorithms for doing nested queries. There's a planning framework where you take action a proportional to p(a) e^E[U | a]. If one of these actions is "defer to your successor", then the computation of (U | a) is actually another query that samples a different action b proportional to p(b) e^E[U | b]. In this case you can actually j... (read more)

0[anonymous]10y

How does this deal with the Paradox of Procrastination?

1[anonymous]10y

When you use e-raised-to-the alpha times expectation, is that similar to the use of an exponential distribution in something like Adaboost, to take something like odds information and form a distribution over assorted weights? I have work to do, but will be giving your little write-up here a second read-through soon. The idea isn't to assign probability mass to logical theories, but to the outcomes of computations in general. This is partly because computations-in-general strictly contains encodings of all possible proof systems, but also because, if we're building algorithms that have to confront a Turing-complete environment, the environment may sometimes contain nontrivially nonhalting computations, which can't be logically proved not to terminate. Since any realistic agent needs to be able to handle whatever its environment throws at it, it seems to follow that a realistic agent needs some resource-rational way to handle nonprovable nonhalting.

FAI Research Constraints and AGI Side Effects

jessicat10y30

Yes, something like that, although I don't usually think of it as an adversary. Mainly it's so I can ask questions like "how could a FAI model its operator so that it can infer the operator's values from their behavior?" without getting hung up on the exact representation of the model or how the model is found. We don't have any solution to this problem, even if we had access to a probabilistic program induction black box, so it would be silly to impose the additional restriction that we can't give the black box any induction problems that are... (read more)

5[anonymous]10y

ZOMFG, can you link to a write-up? This links up almost perfectly with a bit of research I've been wanting to do. I more meant "adversary" in crypto terms: something that can and will throw behavior at us we don't want unless we formally demonstrate that it can't. I have a slightly different perspective on the bounded/unbounded issue. Have you ever read Jaynes' Probability Theory? Well, I never got up to the part where he undoes paradoxes, but the way he preached about it sunk in: a paradox will often arise because you passed to the limit too early in your proof or construction. I've also been very impressed by the degree to which resource-rational and bounded-rational models of cognition explain facts about real minds that unbounded models either can't explain at all or write off as "irrational". To quote myself (because it's applicable here but the full text isn't done): In my perspective, at least, AIXI is cheating by assuming unbounded computational power, with the result that even the "bounded" and "approximate" AIXI_{tl} runs in "optimal" time modulo an astronomically-large additive constant. So I think that a "bottom-up" theory of bounded-rational reasoning or resource-rational reasoning - one that starts with the assumption we have strictly bounded finite compute-power the same way probability theory assumes we have strictly bounded finite information - will work a lot better to explain how to scale up by "passing to the limit" at the last step. Which then goes to that research I want to do: I think we could attack logical uncertainty and probabilistic reflection by finding a theory for how to trade finite amounts of compute time for finite amounts of algorithmic information. The structure currently in my imagination is a kind of probability mixed with domain theory: the more computing power you add, the more certain you can become about the results of computations, even if you still have to place some probability mass on \Bot (bottom). In fact, if you

FAI Research Constraints and AGI Side Effects

jessicat10y70

I should be specific that the kinds of results we want to get are those where you could, in principle, use a very powerful computer instead of a hypercomputer. Roughly, the unbounded algorithm should be a limit of bounded algorithms. The kinds of allowed operations I am thinking about include:

Solomonoff induction
optimizing an arbitrary function
evaluating an arbitrary probabilistic program
finding a proof of X if one exists
solving an infinite system of equations that is guaranteed to have a solution

In all these cases, you can get arbitrarily good a... (read more)

0[anonymous]10y

Ah, ok. So you're saying, "Let's do FAI by first assuming we have an incomplete infinity of processing power to apply -- thus assuming the Most Powerful Possible Agent as our 'adversary' to be 'tamed'." Hence the continual use of AIXI?

FAI Research Constraints and AGI Side Effects

jessicat10y30

Thanks for the detailed response! I do think the framework can still work with my assumptions. The way I would model it would be something like:

In the first stage, we have G->Fremaining (the research to an AGI->FAI solution) and Gremaining (the research to enough AGI for UFAI). I expect G->Fremaining < Gremaining, and a relatively low leakage ratio.
after we have AGI->FAI, we have Fremaining (the research for the AGI to input to the AGI->FAI) and Gremaning (the research to enough AGI for UFAI). I expect Fremaining > Gremaining,

... (read more)

2ozziegooen10y

Good point. I guess the most controversial, and hopefully false, assumption of this paper is #3: 'If Gremaining is reached before Fremaining, a UFAI will be created. If after, an FAI will be created.' This basically is the AI Foom scenario, where the moment an AGI is created, it will either kill us or all or bring about utopia (or both). If this is not the case, and we have a long time to work with the AGI as it develops to make sure it is friendly, then this model isn't very useful. If we do assume these assumptions, I would also expect that we will reach Gremaining before Fremaining, or at least that a private organization will end up doing so. However, I am also very skeptical in the power of secrets. I think I find us reaching Fremaining first more likely than a private institution reaching Gremaining first, but hiding it until it later reaches Fremaining, though both may be very slim. If the US military or a similar group with a huge technological and secretive advantage were doing this, there could be more of a chance. This definitely seems like a game of optimizing small probabilities. Either way, I think we definitely would agree here that the organization developing these secrets can strategically choose projects that deliver the high amounts of FAI research relative to the amount AGI research they will have to keep secretive. Begin with the easy, non-secretive wins and work from there. We may need the specific technology to create a paperclip maximizer before we make an FAI, but if we plan correctly, we hopefully will be really close to reaching an FAI by that point.

FAI Research Constraints and AGI Side Effects

jessicat10y200

This model seems quite a bit different from mine, which is that FAI research is about reducing FAI to an AGI problem, and solving AGI takes more work than doing this reduction.

More concretely, consider a proposal such as Paul's reflective automated philosophy method, which might be able to be implemented using epsiodic reinforcement learning. This proposal has problems, and it's not clear that it works -- but if it did, then it would have reduced FAI to a reinforcement learning problem. Presumably, any implementations of this proposal would benefit from ... (read more)

0[anonymous]10y

Sorry to criticize out of the blue, but I think that's a very bad idea. To wit, "Assume a contradiction, prove False, and ex falso quodlibet." If you start by assuming a hypercomputer and reason mathematically from there, I think you'll mostly derive paradox theorems and contradictions.

3ozziegooen10y

[Edited: replaced Gremaining with Fremaining, which is what I originally meant] Thanks for the comment jessicat! I haven't read those posts yet, will do more research on reducing FAI to an AGI problem. A few responses & clarifications: Our framework assumes the FAI research would happen before AGI creation. If we can research how to reduce FAI to an AGI problem in a way that would reliably make a future AGI friendly, then that amount of research would be our variable Fremaining. If that is quite easy to do, then that's fantastic; an AI venture would have an easy time, and the leakage ratio would be low enough to not have to worry about. Additional required capabilities that we'll find out we need would be added to Fremaining. "I think the post fails to accurately model these difficulties." -> This post doesn't attempt to model the individual challenges to understand how large Fremaining actually is. That's probably a more important question than what we addressed, but one for a different model. "The right answer here is to get AGI researchers to develop (and not publish anything about) enough AGI capabilities for FAI without running a UFAI in the meantime, even though the capabilities to run it exist." -> This paper definitely advocates for AGI researchers to develop FAI research while not publishing much AGI research. I agree that some internal AGI research will probably be necessary, but hope that it won't be a whole lot. If the tools to create an AGI were figured out, even if they were kept secret by an FAI research group, I would be very scared. Those would be the most important and dangerous secrets of all time, and I doubt they could be kept secret for very long (20 years max?) "In this case, the model in the post seems to be mostly accurate, except that it neglects the fact that serial advances might be important (so we get diminishing marginal progress towards FAI or AGI per additional researcher in a given year)." -> This paper purposefully didn't mo

3Alexei10y

I feel like this still fits their model pretty well. You need Fremaining time to find AGI->FAI solution, while there is Gremaining time before someone builds an AGI.

Debunking Fallacies in the Theory of AI Motivation

jessicat10y10

I agree that choosing an action randomly (with higher probability for good actions) is a good way to create a fuzzy satisficer. Do you have any insights into how to:

create queries for planning that don't suffer from "wishful thinking", with or without nested queries. Basically the problem is that if I want an action conditioned on receiving a high utility (e.g. we have a factor on the expected utility node U equal to e^(alpha * U) ), then we are likely to choose high-variance actions while inferring that the rest of the model works out such that these actions return high utilities
extend this to sequential planning without nested nested nested nested nested nested queries

Debunking Fallacies in the Theory of AI Motivation

jessicat10y50

I would tend to say that you should be training a conceptual map of the world before you install anything like action-taking capability or a goal system of any kind.

This seems like a sane thing to do. If this didn't work, it would probably be because either

lack of conceptual convergence and human understandability; this seems somewhat likely and is probably the most important unknown
our conceptual representations are only efficient for talking about things we care about because we care about these things; a "neutral" standard such as reso

jessicat10y30

Regularization is already a part of training any good classifier.

A technical point here: we don't learn a raw classifier, because that would just learn human judgments. In order to allow the system to disagree with a human, we need to use some metric other than "is simple and assigns high probability to human judgments".

For something like FAI, I want a concept-learning algorithm that will look at the world in this naturalized, causal way (which is what normal modelling shoots for!), and that will model correctly at any level of abstraction

... (read more)

2[anonymous]10y

Well, all real reasoners are bounded reasoners. If you just don't care about computational time bounds, you can run the Ordered Optimal Problem Solver as the initial input program to a Goedel Machine, and out pops your AI (in 200 trillion years, of course)! I would tend to say that you should be training a conceptual map of the world before you install anything like action-taking capability or a goal system of any kind. Of course, I also tend to say that you should just use a debugged (ie: cured of systematic faults) model of human evaluative processes for your goal system, and then use actual human evaluations to train the free parameters, and then set up learning feedback from the learned concept of "human" to the free-parameter space of the evaluation model.

2[anonymous]10y

Right: and the metric I would propose is, "counterfactual-prediction power". Or in other words, the power to predict well in a causal fashion, to be able to answer counterfactual questions or predict well when we deliberately vary the experimental conditions. To give a simple example: I train a system to recognize cats, but my training data contains only tabbies. What I want is a way of modelling that, while it may concentrate more probability on a tabby cat-like-thingy being a cat than a non-tabby cat-like-thingy, will still predict appropriately if I actually condition it on "but what if cats weren't tabby by nature?". I think you said you're a follower of the probabilistic programming approach, and in terms of being able to condition those models on counterfactual parameterizations and make predictions, I think they're very much on the right track.

Debunking Fallacies in the Theory of AI Motivation

jessicat10y60

Okay, thanks a lot for the detailed response. I'll explain a bit about where I'm coming from with understading the concept learning problem:

I typically think of concepts as probabilistic programs eventually bottoming out in sense data. So we have some "language" with a "library" of concepts (probabilistic generative models) that can be combined to create new concepts, and combinations of concepts are used to explain complex sensory data (for example, we might compose different generative models at different levels to explain a pictu

... (read more)

3[anonymous]10y

I think it would not go amiss to read Vikash Masinghka's PhD thesis and the open-world generation paper to see a helpful probabilistic programming approach to these issues. In summary: we can use probabilistic programming to learn the models we need, use conditioning/query to condition the models on the constraints we intend to enforce, and then sample the resulting distributions to generate "actions" which are very likely to be "good enough" and very unlikely to be "bad". We sample instead of inferring the maximum-a-posteriori action or expected action precisely because as part of the Bayesian modelling process we assume that the peak of our probability density does not necessary correspond to an in-the-world optimum.

[anonymous]10y100

I think you have homed in exactly on the place where the disagreement is located. I am glad we got here so quickly (it usually takes a very long time, where it happens at all).

Yes, it is the fact that "weak constraint" systems have (supposedly) the property that they are making the greatest possible attempt to find a state of mutual consistency among the concepts, that leads to the very different conclusions that I come to, versus the conclusions that seem to inhere in logical approaches to AGI. There really is no underestimating the drastic di... (read more)

Debunking Fallacies in the Theory of AI Motivation

jessicat10y50

We can do something like list a bunch of examples, have humans label them, and then find the lowest Kolomogorov complexity concept that agrees with human judgments in, say, 90% of cases. I'm not sure if this is what you mean by "normatively correct", but it seems like a plausible concept that multiple concept learning algorithms might converge on. I'm still not convinced that we can do this for many value-laden concepts we care about and end up with something matching CEV, partially due to complexity of value. Still, it's probably worth systematically studying the extent to which this will give the right answers for non-value-laden concepts, and then see what can be done about value-laden concepts.

4[anonymous]10y

Regularization is already a part of training any good classifier. Roughly speaking, I mean optimizing for the causal-predictive success of a generative model, given not only a training set but a "level of abstraction" (something like tagging the training features with lower-level concepts, type-checking for feature data) and a "context" (ie: which assumptions are being conditioned-on when learning the model). Again, roughly speaking, humans tend to make pretty blatant categorization errors (ie: magical categories, non-natural hypotheses, etc.), but we also are doing causal modelling in the first place, so we accept fully-naturalized causal models as the correct way to handle concepts. However, we also handle reality on multiple levels of abstraction: we can think in chairs and raw materials and chemical treatments and molecular physics, all of which are entirely real. For something like FAI, I want a concept-learning algorithm that will look at the world in this naturalized, causal way (which is what normal modelling shoots for!), and that will model correctly at any level of abstraction or under any available set of features, and will be able to map between these levels as the human mind can. Basically, I want my "FAI" to be built out of algorithms that can dissolve questions and do other forms of conceptual analysis without turning Straw Vulcan and saying, "Because 'goodness' dissolves into these other things when I naturalize it, it can't be real!". Because once I get that kind of conceptual understanding, it really does get a lot closer to being a problem of just telling the agent to optimize for "goodness" and trusting its conceptual inference to work out what I mean by that. Sorry for rambling, but I think I need to do more cog-sci reading to clarify my own thoughts here.

Debunking Fallacies in the Theory of AI Motivation

jessicat10y100

Thanks for your response.

The AI can quickly assess the "forcefulness" of any candidate action plan by asking itself whether the plan will involve giving choices to people vs. forcing them to do something whether they like it or not. If a plan is of the latter sort, more care is needed, so it will canvass a sample of people to see if their reactions are positive or negative.

So, I think this touches on the difficult part. As humans, we have a good idea of what "giving choices to people" vs. "forcing them to do something" l... (read more)

3[anonymous]10y

Maybe I'm biased as an open proponent of probabilistic programming, but I think the latter can make AGI at all, while the former not only would result in opaque AGI, but basically can't result in a successful real-world AGI at all. I don't think you can get away from the need to do hierarchical inference on complex models in Turing-complete domains (in short: something very like certain models expressible in probabilistic programming). A deep neural net is basically just drawing polygons in a hierarchy of feature spaces, and hoping your polygons have enough edges to approximate the shape you really mean but not so many edges that they take random noise in the training data to be part of the shape -- given just the right conditions, it can approximate the right thing, but it can't even describe how to do the right thing in general.

6[anonymous]10y

I see where you are coming from in what you have just said, but to give a good answer I need to take a high-level stance toward what you are saying. This is because there is a theme running through your ideas, here, and it is the theme, rather than the specifics, that I need to address. You have mentioned on the serval occasions the idea that "AGI-concepts" and "Human-concepts" might not align, with the result that we might have difficulty understanding what they are really meaning when they use a given concept. In particular, you use the idea that there could be some bad misalignments of concepts - for example, when the AGI makes a conceptual distinction between "giving choices to people" and "forcing them to do something", and even though our own version of that same distinction corresponds closely to the AGI's version most of the time, there are some peculiar circumstances (edge cases) where there is a massive or unexpectedly sharp discrepancy. Putting this idea in the form of an exaggerated, fictional example, it is as if we meet a new culture out in the middle of Darkest Africa, and in the course of translating their words into ours we find a verb that seems to mean "cook". But even though there are many examples (cooking rice, cooking bread, cooking meat, and even brewing a cup of tea) that seem to correspond quite closely, we suddenly find that they ALSO use this to refer to a situation where someone writes their initials on a tree, and another case where they smash someone's head with a rock. And the natives claim that this is not because the new cases are homonyms, they claim that this is the very same concept in all cases. We might call this a case of "alien semantics". The first thing to say about this, is that it is a conceptual minefield. The semantics (or ontological grounding) of AI systems is, in my opinion, one of the least-well developed parts of the whole field. People often pay lip-service to some kind of model-theoretical justification for a

1[anonymous]10y

Why does everyone suppose that there are a thousand different ways to learn concepts (ie: classifiers), but no normatively correct way for an AI to learn concepts? It seems strange to me that we think we can only work with a randomly selected concept-learning algorithm or the One Truly Human Concept-Learning Algorithm, but can't say when the human is wrong.

Debunking Fallacies in the Theory of AI Motivation

jessicat10y180

Thanks for posting this; I appreciate reading different perspectives on AI value alignment, especially from AI researchers.

But, truthfully, it would not require a ghost-in-the-machine to reexamine the situation if there was some kind of gross inconsistency with what the humans intended: there could be some other part of its programming (let’s call it the checking code) that kicked in if there was any hint of a mismatch between what the AI planned to do and what the original programmers were now saying they intended. There is nothing difficult or intrinsi

... (read more)

[anonymous]10y120

I am going to have to respond piecemeal to your thoughtful comments, so apologies in advance if I can only get to a couple of issues in this first response.

Your first remark, which starts

If there is some good way...

contains a multitude of implicit assumptions about how the AI is built, and how the checking code would do its job, and my objection to your conclusion is buried in an array of objections to all of those assumptions, unfortunately. Let me try to bring some of them out into the light:

1) When you say

If there is some good way of explaining

jessicat10y30

So, you can compress a list of observations about which Turing machines halt by starting with a uniform prior over Chaitin's omega. This can lead to quite a lot of compression: the information of whether the first n Turing machines halt consists of n bits, but only requires log(n) bits of Chaitin's omega. If we saw whether more Turing machines halted, we would also uncover more bits of Chaitin's omega. Is this the kind of thing you are thinking of?

I guess there's another question of how any of this makes sense if the universe is computable. We can stil... (read more)

1[anonymous]10y

Think more like an inductive human being than a deductive formal system: "x doesn't ever halt" would require proof, but "x really seems not to halt for a very long time and seems to contain an infinitely-growing recursion there" gives us very valid reason to believe "x doesn't halt", and guides us in trying to assemble a proof. (Likewise, as a side note, it can actually be quite difficult to prove "x does halt (on all inputs)", because merely being observed to halt on many inputs is no guarantee at all. Unless the looping/recursion is trivially structural on the input, proving "x does halt" to a type system's termination checker can be quite difficult!) I think that considering "computable" or "uncomputable" to be ontological properties of universes is... contrary to the philosophical viewpoint being taken here? At least from my point of view here, you're not breaking an Ontological Law of the Platonic Plane of Maths if you "decide the undecidable"; you're breaking a law of thermodynamics (namely, you're trying to act as if you possessed infinite entropy with which to compute). It's physically impossible, not Platonically impossible. If I may invoke a phrase, I'm basically taking the point of view that a formal system is a map of some mathematical territory (which Turing machines halt), but that it is possible to have increasingly detailed and accurate maps (systems with stronger axioms) of the same territory (increasing the quantity of Turing machines whose behavior we can decide). The hard nut to crack is: how do we acquire better maps? Further, the universe doesn't necessarily have to be computable in any ontological sense. To talk about the physical, empirical universe, we only need to make predictions about events which take place a finite period of time into the future. If we're composing programs whose termination we want to guarantee, we can use inductive and coinductive type systems to guarantee termination and stepwise termination where we need them, a

A quick sketch on how the Curry-Howard Isomorphism kinda appears to connect Algorithmic Information Theory with ordinal logics

jessicat10y60

One part I'm not clear on is how the empirical knowledge works. The equivalent of "kilograms of mass" might be something like bits of Chaitin's omega. If you have n bits of Chaitin's omega, you can solve the halting problem for any Turing machine of length up to n. But, while you can get lower bounds on Chaitin's omega by running Turing machines and seeing which halt, you can't actually learn upper bounds on Chaitin's omega except by observing uncomputable processes (for example, a halting oracle confirming that some Turing machine doesn't hal... (read more)

1[anonymous]10y

That is the domain of things which I am referring to, yes. This is where things get fuzzier. I think that by routing through logic and observing the empirical consequences of ordinal consistency axioms, we could learn to believe those axioms and thus eventually climb the ordinal hierarchy to gain knowledge of the halting behavior of increasingly many Turing machines. However, my original idea was much plainer: just do a kind of Solmonoff Induction over Turing machines encoding axiom sets. Even empirical data only allows us to enumerate a finite number of nonhalting Turing machines, without being able to decide the halting/nonhalting behavior of Turing machines too large for our empirical information.

Why isn't the following decision theory optimal?

jessicat10y90

There's one scenario described in this paper on which this decision theory gives in to blackmail:

The Retro Blackmail problem. There is a wealthy intelligent system and an honest AI researcher with access to the agent’s original source code. The researcher may deploy a virus that will cause $150 million each in damages to both the AI system and the researcher, and which may only be deactivated if the agent pays the researcher $100 million. The researcher is risk-averse and only deploys the virus upon becoming confident that the agent will pay up. The age

... (read more)

0Elliot_Olds10y

I believe that NDT gets this problem right. The paper you link to shows that a pure CDT agent would not self modify into an NDT agent, because a CDT agent wouldn't really have the concept of "logical" connections between agents. The understanding that both logical and causal connections are real things is what would compel an agent to self-modify to NDT. However, if there was some path by which an agent started out as pure CDT and then became NDT, the NDT agent would still choose correctly on Retro Blackmail even if the researcher had its original CDT source code. The NDT agent's decision procedure explicitly tells it to behave as if it had precommitted before the researcher got its source code. So even if the CDT --> NDT transition is impossible, since I don't think any of us here are pure CDT agents, we can still adopt NDT and profit.

Second-Order Logic: The Controversy

jessicat10y00

It's possible to compute whether each machine halts using an inductive Turing machine like this:

initialize output tape to all zeros, representing the assertion that no Turing machine halts
for i = 1 to infinity
. for j = 1 to i
. .       run Turing machine j for i steps
. .       if it halts: set bit j in the output tape to 1

Is this what you meant? If so, I'm not sure what this has to do with observing loops.

When you say that every nonhalting Turing machine has some kind of loop, do you mean the kind of loop that many halting Turing machines also contain?

0[anonymous]10y

No, I mean precisely the fact that it doesn't halt. You can think of it as an infinitely growing recursion if that helps, but what's in question is really precisely the nonhalting behavior. I'm going to make a Discussion post about the matter, since Luke wants me to share the whole informal shpiel on the subject.

New forum for MIRI research: Intelligent Agent Foundations Forum

jessicat10y60

Thanks for the response. I should note that we don't seem to disagree on the fact that a significant portion of AI safety research should be informed by practical considerations, including current algorithms. I'm currently getting a masters degree in AI while doing work for MIRI, and a substantial portion of my work at MIRI is informed by my experience with more practical systems (including machine learning and probabilistic programming). The disagreement is more that you think that unbounded solutions are almost entirely useless, while I think they are... (read more)

3[anonymous]10y

That is absolutely not a route I would consider. If that's what you took away from my suggestion, please re-read it! My suggestion is that MIRI should consider pathways to leveradging superintelligence which don't involve agent-y processes (genies) at all. Processes which are incapable of taking action themselves, and whose internal processes are real-time audited and programmatically constrained to make deception detectable. Tools used as cognitive enhancers, not stand-alone cognitive artifacts with their own in-built goals. SIAI spent a decade building up awareness of the problems that arise from superintelligent machine agents. MIRI has presumed from the start that the way to counteract this threat is to build a provably-safe agent. I have argued that this is the wrong lesson to draw -- the better path forward is to not create non-human agents of any type, at all!

New forum for MIRI research: Intelligent Agent Foundations Forum

jessicat10y110

Learning how to create even a simple recommendation engine whose output is constrained by the values of its creators would be a large step forward and would help society today.

I think something showing how to do value learning on a small scale like this would be on topic. It might help to expose the advantages and disadvantages of algorithms like inverse reinforcement learning.

I also agree that, if there are more practical applications of AI safety ideas, this will increase interest and resources devoted to AI safety. I don't really see those applic... (read more)

Houshalter10y100

I don't have a great understanding of the history of engineering, but I get the impression that working from the theory backwards can often be helpful. For example, Turing developed the basics of computer science before sufficiently general computers existed.

The first computer was designed by Babbage who was mostly interested in practical applications (although admitedly it was never built.) 100 years later Konrad Zuse developed the first working computer and was also for practical purposes. I'm not sure if he was even aware of Turing's work.

Not that Tu... (read more)

2[anonymous]10y

One way to fix the lack of historical perspective is to actively involve engineers and their projects into the MIRI research agenda, rather than specifically excluding them. Regarding your example, Turing hardly invented computing. If anything that honor probably goes to Charles Babbage who nearly a century earlier designed the first general computation devices, or to the various business equipment corporations that had been building and marketing special purpose computers for decades after Babbage and prior to the work of Church and Turing. It is far, far easier to provide theoretical backing to a broad category of devices which are already known to work than to invent out of whole cloth a field with absolutely no experimental validation. The first statement is trivially true: everything is easier on a hypercomputer. But who cares? we don't have hypercomputers. The second statement is the real meat of the argument -- that "it's hard to say much about the [tractable FAI] if we haven't made steps towards solving the [uncomputable FAI]." While on the surface that seems like a sensible statement, I'm afraid your intuition fails you here. Experience with artificial intelligence has shown that there does not seem to be any single category of tractable algorithms which provides general intelligence. Rather we are faced with a dizzying array of special purpose intelligences which in no way resemble general models like AIXI, and the first superintelligences are likely to be some hodge-podge integration of multiple techniques. What we've learned from neuroscience and modern psychology basically backs this up: the human mind at least achieves its generality from a variety of techniques, not some easy-to-analyze general principle. It's looking more and more likely that the tricks we will use to actually achieve general intelligence will not resemble in the slightest the simple unbounded models for general intelligence that MIRI currently plays with. It's not unreasonable

New forum for MIRI research: Intelligent Agent Foundations Forum

jessicat10y100

I think a post saying something like "Deep learning architectures are/are not able to learn human values because of reasons X, Y, Z" would definitely be on topic. As an example of something like this, I wrote a post on the safety implications of statistical learning theory. However, an article about how deep learning algorithms are performing on standard machine learning tasks is not really on topic.

I share your sentiment that safety research is not totally separate from other AI research. But I think there is a lot to be done that does not re... (read more)

4[anonymous]10y

First of all, purposefully limiting scope to protecting against only the runaway superintelligence scenario is preventing a lot of good that could be done right now, and keeps your work from having practical applications it otherwise would have. For example, right now somewhere deep in Google and Facebook there are machine learning recommendation engines that are suggesting the display of whisky ads to alcoholics. Learning how to create even a simple recommendation engine whose output is constrained by the values of its creators would be a large step forward and would help society today. But I guess that's off-topic. Second, even if you buy the argument that existential risk trumps all and we should ignore problems that could be solved today such as that recommendation engine example, it is demonstrably not the case in history that the fastest way to develop a solution is to ignore all practicalities and work from theory backwards. No, in almost every case what happens is the practical and the theoretical move forward hand in hand, with each informing progress in the other. You solve the recommendation engine example not because it has the most utilitarian direct outcomes, but because the theoretical and practical outcomes are more likely to be relevant to the larger problem than an ungrounded problem chosen by different means. And on the practical side, you will have engineers coming forward the beginnings of solutions -- "hey I've been working on feedback controls, and this particular setup seems to work very well in the standard problem sets..." In the real world theoreticians more often than not spend their time proving the correctness of the work of a technologist, and then leveraging that theory to improve upon it. Third, there are specific concerns I have about the approach. Basically time spent now on unbounded AIXI constructs is probably completely wasted. Real AGIs don't have Solomonoff inductors or anything resembling them. Thinking that unbounded solut

Identity and quining in UDT

jessicat10y50

This is an interesting approach. The way I'm currently thinking of this is that you ask what agent a UDT would design, and then do what that agent does, and vary what type an agent is between the different designs. Is this correct?

Consider the anti-Newcomb problem with Omega's simulation involving equation (2)

So is this equation (2) with P replaced with something else?

However, the computing power allocated for evaluation the logical expectation value in (2) might be sufficient to suspect P's output might be an agent reasoning based on (2).

I don't understand this sentence.

3Squark10y

Hi jessicat, thx for commenting! Sounds about right. No, it's the same P. When I say "Omega's simulation doesn't involve P" I mean Omega is not executing P and using the result. Omega is using equation (2) directly, but P still enters into equation (2). Logical uncertainty (the way I see it), is a way to use a certain amount of computing resources to assign probabilities to the outcomes of a computation requiring a larger amount of computing resource. These probabilities depend on the specific resource bound. However, this is not essential to the point I'm making. The point I'm making is that if the logical uncertainty ensemble assigns non-zero probability to P producing XDT, we end up with logical correlation that is better avoided.

Anatomy of Multiversal Utility Functions: Tegmark Level IV

jessicat10y10

It still seems like this is very much affected by the measure you assign to different game of life universes, and that the measure strongly depends on f.

Suppose we want to set f to control the agent's behavior, so that when it sees sensory data s, it takes silly action a(s), where a is a short function. To work this way, f will map game of life states in which the agent has seen s and should take action a(s) to binary strings that have greater measure, compared to game of life states in which the agent has seen s and should take some other action. I thin... (read more)

2Squark10y

What you're saying can be rephrased as follows. The prior probability measure on the space of (possibly rule-violating) Game of Life histories depends on f since it is the f-image of the Solomonoff measure. You are right. However, the dependence is as strong as the dependence of the Solomonoff measure on the choice of a universal Turing machine. In other words, the complexity of the f you need to make G take a silly action is about the same as the complexity of the universal Turing machine you need to make G take the same action.

Anatomy of Multiversal Utility Functions: Tegmark Level IV

jessicat10y30

Thanks for the additional explanation.

It is of similar magnitude to differences between using different universal Turing machines in the definition of the Solomonoff ensemble. These difference become negligible for agents that work with large amounts of evidence.

Hmm, I'm not sure that this is something that you can easily get evidence for or against? The 2^K factor in ordinary Solomonoff induction is usually considered fine because it can only cause you to make at most K errors. But here it's applying to utilities, which you can't get evidence for ... (read more)

1Squark10y

Let me explain by the way of example. Consider 3 possible encodings: f maps "0" bits in our sequence to empty cells and "1" bits to full cells, while traversing the cells in V in some predefined order. g is the same with 0 and 1 reversed. h computes the XOR of all bits up to a given point and uses that to define the state of the cell. So far we haven't discussed what the agent G itself is like. Suppose G is literally a construct within the Game of Life universe. Now, our G is a UDT agent so it decides its action by computing logical uncertainty conditional expectation values of the form roughly val(a) = E(E(U) | G() = a) where "G()" stands for evaluating the program implemented by G. How does the condition "G() = a" influence E(U)? Since G actually exists within some GoL partial histories, different actions of G lead to different continuations of these histories. Depending on the choice of f,g,h these histories will map to very different binary sequences. However in all cases the effect of G's action on the number of gliders in its universe will be very similar.

Anatomy of Multiversal Utility Functions: Tegmark Level IV

jessicat10y60

I think you're essentially correct about the problem of creating a utility function that works across all different logically possible universes being important. This is kind of like what was explored in the ontological crisis paper. Also, I agree that we want to do something like find a human's "native domain" and map it to the true reality in order to define utility functions over reality.

I think using something like Solomonoff induction to find multi-level explanations is a good idea, but I don't think your specific formula works. It looks ... (read more)

1Squark10y

Thx for commenting! Indeed de Blanc's paper explores questions which I intend to solve in this series. This decomposition is something I plan to address in detail in the followup post, where physics comes into the picture. However, I hope to convince you there is no problem in the formula. It seems that you are thinking about the binary sequence x as a sequence of observations made by an observer, which is the classical setup of Solomonoff induction. However, in my approach there is no fixed a priori relation between the sequence and any specific description of the physical universe. In UDT, we consider conditional expectation values with respect to logical uncertainty of the form val(a; U) = E(E(U) | The agent implements input-output mapping a) where the inner E(U) is the Solomonoff expectation value from equation [1] and the outer E refers to logical uncertainty. Therefore, out of all programs contributing the the Solomonoff ensemble, the ones that contribute to the a-dependence of val are the ones that produce the universe encoded in a form compatible with f. No, it subtracts something proportional to the number of cells that don't satisfy the rules. Of course. It is of similar magnitude to differences between using different universal Turing machines in the definition of the Solomonoff ensemble. These difference become negligible for agents that work with large amounts of evidence. f is required to be bijective, so it cannot lose or create information. Therefore, regardless of f, some programs in the Solomonoff ensemble will produce gliders and others won't.

Compartmentalizing: Effective Altruism and Abortion

jessicat10y40

So, you can kill a person, create a new person, and raise them to be about equivalent to the original person (on average; this makes a bit more sense if we do it many times so the distribution of people, life outcomes, etc is similar). I guess your question is, why don't we do this (aside from the cost)? A few reasons come to mind:

It would contradict the person's preferences to die more than it contradicts the non-existing people's preferences to never exist.
It would cause emotional suffering to people who know the person.
If people knew that people w

... (read more)

0alienist10y

Correct, I don't favor abortions either.

Compartmentalizing: Effective Altruism and Abortion

jessicat10y70

I said that fetuses are replaceable, not that all people are replaceable. OP didn't argue that fetuses weren't replaceable, just that they won't get replaced in practice.

3alienist10y

And why aren't people replaceable. It strikes me that they are in fact replaceable in the sense you mean.

Compartmentalizing: Effective Altruism and Abortion

jessicat10y180

I don't think you did justice to the replaceability argument. If fetuses are replaceable, then the only benefit of banning abortion is that it increases the fertility rate. However, there are far better ways to increase the fertility rate than banning abortion. For example, one could pay people to have children (and maybe give them up for adoption). So your argument is kind of like saying that since we really need farm laborers, we should allow slavery.

-1alienist10y

Wouldn't the replaceability argument imply that it's ok to kill people who don't have unique skills?

"incomparable" outcomes--multiple utility functions?

jessicat10y50

I think a useful meaning of "incomparable" is "you should think a very long time before deciding between these". In situations like these, the right decision is not to immediately decide between them, but to think a lot about the decision and related issues. Sure, if someone has to make a split-second decision, they will probably choose whichever sounds better to them. But if given a long time, they might think about it a lot and still not be sure which is better.

This seems a bit similar to multiple utility functions in that if you h... (read more)

1Sarunas10y

Indeed, sometimes whether or not two options are incomparable depends on how much computational power your brain is ready to spend calculating and comparing the differences. Things that are incomparable might become comparable if you think about them more. However, when one is faced with the need to decide between the two options, one has to use heuristics. For example, in his book "Predictably irrational" Dan Ariely writes: So, it seems that one possible heuristic is to try to match your options against yet more alternatives and the option that wins more (and loses less) matches is "declared a winner". As you can see, the result that is obtained using this particular heuristic depends on what kind of alternatives the initial options are compared against. Therefore this heuristic is probably not good enough to reveal which option is "truly better" unless, perhaps, the choice of alternatives is somehow "balanced" (in some sense, I am not sure how to define it exactly). It seems to me, that in many case if one employs more and more (and better) heuristics one can (maybe after quite a lot of time spent deliberating the choice) approach finding out which option is "truly better". However, the edge case is also interesting. As you can see, the decision is not made instantly, it might take a lot of time. What if your preferences are less stable in a given period of time than your computational power allows you to calculate during that period of time? Can two options be said to be equal if your own brain does not have enough computational power to consistently distinguish between them seemingly even in principle, even if more powerful brain could make such decision (given the same level of instability of preferences)? What about creatures that have very little computational power? Furthermore, aren't preferences themselves usually defined in terms of decision making? At the moment I am a bit confused about this.

Open thread, Dec. 15 - Dec. 21, 2014

jessicat10y100

We updated on the fact that we exist. SSA does this a little too: specifically, the fact that you exist means that there is at least one observer. One way to look at it is that there is initially a constant number of souls that get used to fill in the observers of a universe. In this formulation, SIA is the result of the normal Bayesian update on the fact that soul-you woke up in a body.

4DanielLC10y

Suppose there is a 2^(-n) chance of universe U_n with n people for n > 0. Initially, there's nothing paradoxical about this. SIA converges. But look at the evidence you get from existing. Call that E. P(U_n|E) = knP(U_n) for some k P(U_n|E) = P(U_n&E)/P(E) P(U_n|E) < P(U_n)/P(E) P(E) < P(U_n)/P(U_n|E) P(E) < P(U_n)/(knP(U_n)) P(E) < 1/kn Since k is constant and this is true for all n, P(E) = 0 So, existence is infinitely unlikely? Or we must assume a priori that the universe definitely doesn't have more than n people for some n? Or P(U_n&E) is somehow higher than P(U_n)?

What Peter Thiel thinks about AI risk

jessicat10y570

Transcript:

Question: Are you as afraid of artificial intelligence as your Paypal colleague Elon Musk?

Thiel: I'm super pro-technology in all its forms. I do think that if AI happened, it would be a very strange thing. Generalized artificial intelligence. People always frame it as an economic question, it'll take people's jobs, it'll replace people's jobs, but I think it's much more of a political question. It would be like aliens landing on this planet, and the first question we ask wouldn't be what does this mean for the economy, it would be are they... (read more)

Eliezer Yudkowsky10y120

Context: Elon Musk thinks there's an issue in the 5-7 year timeframe (probably due to talking to Demis Hassabis at Deepmind, I would guess). By that standard I'm also less afraid of AI than Elon Musk, but as Rob Bensinger will shortly be fond of saying, this conflates AGI danger with AGI imminence (a very very common conflation).

-14examachine10y

Why I will Win my Bet with Eliezer Yudkowsky

jessicat10y00

I'm talking about the fact that humans can (and sometimes do) sort of optimize the universe. Like, you can reason about the way the universe is and decide to work on causing it to be in a certain state.

So people say they have general goals, but in reality they remain human beings with various tendencies, and continue to act according to those tendencies, and only support that general goal to the extent that it's consistent with those other behaviors.

This could very well be the case, but humans still sometimes sort of optimize the universe. Like, I'm ... (read more)

1Unknowns10y

Ok. I don't think we are disagreeing here much, if at all. I'm not maintaining that there's no risk from AI, just that the default original AI is likely not to be universe-optimizing in that way. When I said in the bet "without paying attention to Friendliness", that did not mean without paying attention to risks, since of course programmers even now try to make their programs safe, but just that they would not try to program it to optimize everything for human goals. Also, I don't understand why so many people thought my side of the bet was a bad idea, when Eliezer is betting at odds of 100 to 1 against me, and in fact there are plenty of other ways I could win the bet, even if my whole theory is wrong. For example, it is not even specified in the bet that the AI has to be self-modifying, just superintelligent, so it could be that first a human level AI is constructed, not superintelligent and not self-modifying, and then people build a superintelligence simply by adding on lots of hardware. In that case it is not clear at all that it would have any fast way to take over the world, even if it had the ability and desire to optimize the universe. First it would have to acquire the ability to self-modify, which perhaps it could do by convincing people to give it that ability or by taking other actions in the external world to take over first. But that could take a while, which would mean that I would still win the bet -- we would still be around acting normally with a superintelligence in the world. Of course, winning the bet wouldn't do me much good in that particular situation, but I'd still win. And that's just one example; I can think of plenty of other ways I could win the bet even while being wrong in theory. I don't see how anyone can reasonably think he's 99% certain both that my theory is wrong and that none of these other things will happen.

Why I will Win my Bet with Eliezer Yudkowsky

jessicat10y00

In that sense I think the orthogonality thesis will turn out to be false in practice, even if it is true in theory. It is simply too difficult to program a precise goal into an AI, because in order for that to work the goal has to be worked into every physical detail of the thing. It cannot just be a modular add-on.

I find this plausible but not too likely. There are a few things needed for a universe-optimizing AGI:

really good mathematical function optimization (which you might be able to use to get approximate Solomonoff induction)
a way to specify

... (read more)

0Unknowns10y

When you say that humans "approximately solve these" are you talking about something like AIXI? Or do you simply mean that human beings manage to have general goals? If it is the second, I would note that in practice a human being does not have a general goal that takes over all of his actions, even if he would like to have one. For example, someone says he has a goal of reducing existential risk, but he still spends a significant amount of money on his personal comfort, when he could be investing that money to reduce risk more. Or someone says he wants to save lives, but he does not donate all of his money to charities. So people say they have general goals, but in reality they remain human beings with various tendencies, and continue to act according to those tendencies, and only support that general goal to the extent that it's consistent with those other behaviors. Certainly they do not pursue that goal enough to destroy the world with it. Of course it is true that eventually a human being may succeed in pursuing some goal sufficiently to destroy the world, but at the moment no one is anywhere close to that. If you are referring to the first, you may or may not be right that it would be possible eventually, but I still think it would be too hard to program directly, and that the first intelligent AIs would behave more like us. This is why I gave the example of an AI that engages in chatting -- I think it is perfectly possible to develop an AI intelligent enough to pass the Turing Test, but which still would not have anything (not even "passing the Turing Test") as a general goal that would take over its behavior and make it conquer the world. It would just have various ways of behaving (mostly the behavior of producing text responses). And I would expect the first AIs to be of this kind by default, because of the difficulty of ensuring that the whole of the AI's activity is ordered to one particular goal.

Why I will Win my Bet with Eliezer Yudkowsky

jessicat10y50

I, too, think that AIs that don't optimize a function over the universe (but might optimize one over a domain) are more likely to be safe. This is quite related to the idea of tool AI, proposed by Holden and criticized by Eliezer.

The key here seems to be creating a way to evaluate and search for self-improvements in a way that won't cause optimization over universe states. In theory, evaluation of a self-improvement might be able to be restricted to a domain: does this modification help me play chess better according to a model of the situation in which ... (read more)

[Link] Physics-based anthropics?

jessicat10y30

Am working on it - as a placeholder, for many problems, one can use Stuart Armstrong's proposed algorithm of finding the best strategy according to a non-anthropic viewpoint that adds the utilities of different copies of you, and then doing what that strategy says.

I think this essentially leads to SIA. Since you're adding utilities over different copies of you, it follows that you care more about universes in which there are more copies of you. So your copies should behave as if they anticipate the probability of being in a universe containing lots of... (read more)

1Brian_Tomasik10y

Of course, it's slightly different from SIA because SIA wants more copies of anyone, whether you or not. If the proportion of individuals who are you remains constant, then SIA is equivalent. Elsewhere in my essay, I discuss a prudential argument (which I didn't invent) for assuming there are lots of copies of you. Not sure if that's the same as Armstrong's proposal. PSA is essentially favoring more copies of you per unit of spacetime / physics / computation / etc. -- as long as we understand "copy of you" to mean "instance of perceiving all the data you perceive right now" rather than just a copy of your body/brain but in a different environment.

[Link] Physics-based anthropics?

jessicat10y10

I'm not sure what you mean by "vanilla anthropics". Both SSA and SIA are "simple object-level rules for assigning anthropic probabilities". Vanilla anthropics seems to be vague enough that doesn't give an answer to the doomsday argument or the presumptuous philosopher problem.

On another note, if you assume that a nonzero percentage of the multiverse's computation power is spent simulating arbitrary universes with computation power in proportion to the probabilities of their laws of physics, then both SSA and SIA will end up giving you very similar predictions to Brian_Tomasik's proposal, although I think they might be slightly different.

2Manfred10y

Am working on it - as a placeholder, for many problems, one can use Stuart Armstrong's proposed algorithm of finding the best strategy according to a non-anthropic viewpoint that adds the utilities of different copies of you, and then doing what that strategy says. Yup. Don't trust them outside their respective ranges of validity. You will predict [consequences of those assumptions, including anthropic consequences]. However, before assuming [stuff about the universe], you should have [observational data supporting that stuff].

Superintelligence 9: The orthogonality of intelligence and goals

jessicat10y20

Okay. We seem to be disputing definitions here. By your definition, it is totally possible to build a very good cross-domain optimizer without it being an agent (so it doesn't optimize a utility function over the universe). It seems like we mostly agree on matters of fact.

[Link] Physics-based anthropics?

jessicat10y10

I agree with this but I prefer weighting things by computation power instead of physics cells (which may turn out to be somewhat equivalent). It's easy to justify this model by assuming that some percentage of the multiverse's computation power is spent simulating all universes in parallel. See Schmidhuber's paper on this.

0Brian_Tomasik10y

Cool -- thanks! Yeah, my proposal is just about how to conceptualize the sample space, but it would be trivial to replace with for some non-constant measure function.

Rodney Brooks talks about Evil AI and mentions MIRI [LINK]

jessicat10y00

Well, he's right that intentionally evil AI is highly unlikely to be created:

Malevolent AI would need all these capabilities, and then some. Both an intent to do something and an understanding of human goals, motivations, and behaviors would be keys to being evil towards humans.

which happens to be the exact reason why Friendly AI is difficult. He doesn't directly address things that don't care about humans, like paperclip maximizers, but some of his arguments can be applied to them.

Expecting more computation to just magically get to intentional int

jessicat10y10

I don't think the connotations of "silly" are quite right here. You could still use this program to do quite a lot of useful inference and optimization across a variety of domains, without killing everyone. Sort of like how frequentist statistics can be very accurate in some cases despite being suboptimal by Bayesian standards. Bostrom mostly only talks about agent-like AIs, and while I think that this is mostly the right approach, he should have been more explicit about that. As I said before, we don't currently know how to build agent-like AGIs at the moment because we haven't solved the ontology mapping problem, but we do know how to build non-agentlike cross-domain optimizers given enough computation power.

1Luke_A_Somers10y

I don't see how being able to using a non-agent program to do useful things means it's not silly to say it has a utility function. It's not an agent.

Superintelligence 9: The orthogonality of intelligence and goals

jessicat10y10

1) Perhaps you give it one domain and a utility function within that domain, and it returns a good action in this domain. Then you give it another domain and a different utility function, and it returns a good action in this domain. Basically I'm saying that it doesn't maximize a single unified utility function.

2) You prove too much. This implies that the Unix cat program has a utility function (or else it is wasting effort). Technically you could view it as having a utility function of "1 if I output what the source code of cat outputs, 0 otherwi... (read more)

1Luke_A_Somers10y

On 2, we're talking about things in the space of agents. Unix utilities are not agents. But if you really want to go that route? You didn't prove it wrong, just silly. The more agent-like the thing we're talking about, the less silly it is.