User Comment Replies

Research agenda for AI safety and a better civilization

It's Pasha Kamyshev, btw :) Main engagement is through

1. reading MIRI papers, especially the older agent foundations agenda papers

2. following the flashy developments in AI, such as Dota / Go RL and being somewhat skeptical of the "random play" part of the whole thing (other things are indeed impressive)

3. Various math text books: category theory for programmers, probability the logic of science, and others

4. Trying to implement certain theory in code (quantilizers, different prediction market mechanisms)

5. Statistics investigations into various claims of "algorithmic bias"

6. Conversations with various people in the community on the topic

Prisoners' Dilemma with Costs to Modeling

agilecaveman7y60

This is excellent. I believe that this result is a good simulation of "what we could expect if the universe is populated by aliens".

https://steemit.com/fermiparadox/@pasha-kamyshev/fermi-paradox

Tl;Dr

Assuming the following:

1) aliens consider both destroying other civilizations and too early contact a form of defection

2) aliens reason from udt principles

3) advanced civilizations have some capacity to simulate non advanced ones

Then roughly the model in the post will work to explain what the strategic equlibrium is.

Sleeping Beauty Resolved?

agilecaveman7y20

if the is indeed a typo, please correct it at the top level post and link to this comment. The broader point is that the interpretation of P( H | X2, M) is probability of heads conditioned on Monday and X2, and P (H |X2) is probability of heads conditioned on X2. In the later paragraphs, you seem to use the second interpretation. In fact, It seems your whole post's argument and "solution" rests on this typo.

Dismissing betting arguments is very reminiscent of dismissing one-boxing in Newcomb's because one defines "CDT" as rati... (read more)

7ksvanhorn7y

No, P(H | X2, M) is Pr(H∣X2,M), and not Pr(H∣X2,Monday). Recall that M is the proposed model. If you thought it meant "today is Monday," I question how closely you read the post you are criticizing. I find it ironic that you write "Dismissing betting arguments is very reminiscent of dismissing one-boxing in Newcomb's" -- in an earlier version of this blog post I brought up Newcomb myself as an example of why I am skeptical of standard betting arguments (not sure why or how that got dropped.) The point was that standard betting arguments can get the wrong answer in some problems involving unusual circumstances where a more comprehensive decision theory is required (perhaps FDT). Re constructing rational agents: this is one use of probability theory; it is not "the point". We can discuss logic from a purely analytical viewpoint without ever bringing decisions and agents into the discussion. Logic and epistemology are legitimate subjects of their own quite apart from decision theory. And probability theory is the unique extension of classical propositional logic to handle intermediate degrees of plausibility. You say you have read PTLOS and others. Have you read Cox's actual paper, or any or detailed discussions of it such as Paris's discussion in The Uncertain Reasoner's Companion, or my own "Constructing a Logic of Plausible Inference: A Guide to Cox's Theorem"? If you think that Cox's Theorem has too many arguable technical requirements, then I invite you to read my paper, "From Propositional Logic to Plausible Reasoning: A Uniqueness Theorem" (preprint here). That proof assumes only that certain existing properties of classical propositional logic be retained when extending the logic to handle degrees of plausibility. It does not assume any particular functional decomposition of plausibilities, nor does it even assume that plausibilities must be real numbers. As with Cox, we end up with the result that the logic must be isomorphic to probability theory. In addit

Sleeping Beauty Resolved?

agilecaveman7y20

I think this post is fairly wrong headed.

First, your math seems to be wrong.

Your numerator is ½ * p(y), which seems like a Pr (H | M) * Pr(X2 |H, M)

Your denominator is 1/2⋅p(y)+1/2⋅p(y)(2−q(y)), which seems like

Pr(H∣M) * Pr(X2∣H,M) + Pr(¬H∣M) * Pr(X2∣¬H,M), which is Pr(X2 |M)

By bayes rule, Pr (H | M) * Pr(X2 |H, M) / Pr(X2 |M) = Pr(H∣X2, M), which is not the same quantity you claimed to compute Pr(H∣X2). Unless you have some sort of other derivation or a good reason why you omitted M in your calculations: this isn’t really “solving” anything.

Second,... (read more)

3ksvanhorn7y

That's a typo. I meant to write Pr(H∣X2,M), not Pr(H∣X2). I'll have more to say soon about what I think is the correct betting argument. Until then, see my comment in reply to Radford Neal about disagreement on how to apply betting arguments to this problem. I said logically prior, not chronologically prior. You cannot have decision theory without probability theory -- the former is necessarily based on the latter. In contrast, probability theory requires no reference to decision theory for its justification and development. Have you read any of the literature on how probability theory is either an or the uniquely determined extension of propositional logic to handle degrees of certainty? If not, see my references. Neither Cox's Theorem nor my theorem rely on any form of decision theory. I'll repeat my response to Jeff Jo: The standard textbook definition of a proposition is a sentence that has a truth value of either true or false. The problem with a statement whose truth varies with time is that it does not have a simple true/false truth value; instead, its truth value is a function from time to the set {true,false}. In logical terms, such a statement is a predicate, not a proposition. For example, "Today is Monday" corresponds to the predicate P(t)≜(dayof(t)=Monday). It doesn't become a proposition until you substitute in a specific value for t, e.g. "Unix timestamp 1527556491 is a Monday." You have not considered the possibility that the usual decision analysis applied to this problem is wrong. There is, in fact, disagreement as to what the correct decision analysis is. I will be writing more on this in a future post. In fact, I explicitly said that at the instant of awakening, Beauty's probability is the same as the prior, because at that point she does not yet have any new information. As she receives sensory input, her probability for Heads decreases asymptotically to 1/2. All of this is just standard probability theory, conditioning on the new i

Open question: are minimal circuits daemon-free?

agilecaveman7y60

I think it's worth distinguishing between "smallest" and "fastest" circuits.

A note on smallest.

1) Consider a travelling salesman problem and a small program that brute-forces the solution to it. If the "deamon" wants to make a travelling salesman visit a particular city first, then they would simply order the solution space to consider it first. This has no guarantee of working, but the deamon would get what it wants some of the time. More generally, if there is a class of solutions we are indifferent to, but daemons ha... (read more)

Optimizing the news feed

agilecaveman8y90

This is really good, however i would love some additional discussion on the way that the current optimization changes the user.

Keep in mind, when facebook optimizes "clicks" or "scrolls", it does so by altering user behavior, thus altering the user's internal S1 model of what is important. This could frequently lead to a distortion of reality, beliefs and self-esteem. There have been many articles and studies correlating facebook usage with mental health. However, simply understanding "optimization" is enough evidence that th... (read more)

0Raemon8y

Why do you no longer work at FB? (It seems like more people who care about things should try working at FB, in particular if there was any learnable path to gaining any degree of power over algorithms or values-of-the-company, but maybe this is just hopelessly naive)

3moridinamael8y

It's just baffling to me that this happened, because it seems on-face obvious that "outrageous" or intentionally politically inflammatory material would be an undesirable attractor in interest-space. My Facebook feed thinks that I'm most interested in the stupidest and most inflammatory individuals and ideas because that's where my eyes linger for reasons that I don't reflectively approve of. I wonder how quickly it would "learn" otherwise if I made an effort to break this pattern.

Modal SAT: Self Cooperation

agilecaveman10yΩ110

I am also confused. How does this do against EABot, aka C1=□(Them(Them)=D) and M = DefectBot. Is the number of boxes not well defined in this case?

0Scott Garrabrant10y

So according to the original Modal Combat framework, EABot is not a Modal Agent. The bots are not allowed to simulate Them(Them).

Meetup : MIRI paper reading group

agilecaveman10y00

hmm, looks like the year is wrong and the delete button has failed to work :(

New(ish) AI control ideas

agilecaveman10y110

Maybe this have been said before, but here is a simple idea:

Directly specify a utility function U which you are not sure about, but also discount AI's own power as part of it. So the new utility function is U - power(AI), where power is a fast growing function of a mix of AI's source code complexity, intelligence, hardware, electricity costs. One needs to be careful of how to define "self" in this case, as a careful redefinition by the AI will remove the controls.

One also needs to consider the creation of subagents with proper utilities as well,... (read more)

1drnickbone10y

Presumably anything caused to exist by the AI (including copies, sub-agents, other AIs) would have to count as part of the power(AI) term? So this stops the AI spawning monsters which simply maximise U. One problem is that any really valuable things (under U) are also likely to require high power. This could lead to an AI which knows how to cure cancer but won't tell anyone (because that will have a very high impact, hence a big power(AI) term). That situation is not going to be stable; the creators will find it irresistible to hack the U and get it to speak up.

Stuart_Armstrong10y100

That's an idea that a) will certainly not work as stated, b) could point the way to something very interesting.

The Value Learning Problem

agilecaveman10y00

Well, i get where you are coming from with Goodhart's Law, but that's not the question. Formally speaking, if we take the set of all utility functions with complexity < N = FIXED complexity number, then one of them is going to be the "best", i.e. most correlated with the "true utility" function which we can't compute.

As you point out, with we are selecting utilities that are too simple, such as straight up life expectancy, then even the "best" function is not "good enough" to just punch into an AGI because it wil... (read more)

1Vaniver10y

I don't think correlation is a useful way to think about this. Utility functions are mappings from consequence spaces to a single real line, and it doesn't make much sense to talk about statistical properties of mappings. Projections in vector spaces is probably closer, or you could talk about a 'perversity measure' where you look at all optimal solutions to the simpler mapping and find the one with the worst score under the complex mapping. (But if you could rigorously calculate that, you have the complex utility function, and might as well use it!) I think the MIRI value learning approach is operating at a higher meta-level here. That is, they want to create a robust methodology for learning human values, which starts with figuring out what robustness means. You've proposed that we instead try to figure out what values are, but I don't see any reason to believe that us trying to figure out what values are is going to be robust.

The Value Learning Problem

agilecaveman10y00

Regarding 2: So, I am a little surprised that step 2: Valuable goals cannot be directly specified is taken as a given.

If we consider an AI as rational optimizer of the ONE TRUE UTILITY FUNCTION, we might want to look for best available approximations of it short term. The function i have in mind is life expectancy(DALY or QALY), since to me, it is easier to measure than happiness. It also captures a lot of intuition when you ask a person the following hypothetical:

if you could be born in any society on earth today, what one number would be most congruent ... (read more)

1Vaniver10y

Whoa, how are you measuring the disability/quality adjustment? That sounds like sneaking in 'happiness' measurements, and there are a bunch of challenges: we already run into issues where people who have a condition rate it as less bad than people who don't have it. (For example, sighted people rate being blind as worse than blind people rate being blind.) There's a general principle in management that really ought to be a larger part of the discussion of value learning: Goodhart's Law. Right now, life expectancy is higher in better places, because good things are correlated. But if you directed your attention to optimizing towards life expectancy, you could find many things that make life less good but longer (or your definition of "QALY" needs to include the entirety of what goodness is, in which case we have made the problem no easier). But here's where we come back to Goodhart's Law: regardless of what simple measure you pick, it will be possible to demonstrate a perverse consequence of optimizing for that measure, because simplicity necessarily cuts out complexity that we don't want to lose. (If you didn't cut out the complexity, it's not simple!)

Vingean Reflection: Reliable Reasoning for Self-Improving Agents

agilecaveman10y20

Note: I may be over my head here in math logic world:

For procrastination paradox:

There seems to be a desire to formalize

T proves G => G, which messes with completeness. Why not straight up try to formalize:

T proves G at time t => T proves G at time t+1 for all t > 0

That way: G => button gets pressed at time some time X and wasn't pressed at X-1

However, If T proves G at X-1, it must also prove G at X, for all X > 1 therefore it won't press the button, unless X = 1.

Basically instead of reasoning of whether proving something makes it true,... (read more)

1So8res10y

An interesting idea :-) A system that generically infers "phi with t replaced by t+1" from "phi is provable" is obviously inconsistent (consider phi to be the sentence "there exists t such that t=0"). You might be able to set something up where a system can infer this for G specifically, but I'm not sure how this is supposed to get you around the procrastination paradox: Say G is "exists t. ButtonPressedAt(t) and not ButtonPressedAt(t-1)", and that there is a parent agent which will take the first action it considers where it can prove that taking the action implies G. Say that there is an action which creates a child using the same proof system, and that the parent considers building the child, and that it can show that the child will take actions only if they provably imply G. The parent concludes that, if it builds the child, it will either press the button or G will be provable. In the latter case, G will be true at t+1, which means that in the latter case, "exists t+1. ButtonPressedAt(t+1) and not ButtonPressedAt(t)." But this implies G (assuming we start at t=1, to allay concerns about whether t-1 exists), and so the parent procrastinates. That said, you can indeed build a system that leverages the fact that the child is making proofs at a different timestep in order to deal with the procrastintaion paradox: this is one of the ideas behind Benja's "parametric polymorphism" (Section 4.2 of this paper). Parametric polymorphism is only a partial solution, and it still has a number of limitations, but following the general idea (of exploiting the fact that parent and child are in different timesteps) does lead to interesting places :-)

LESSWRONG
LW

All of agilecaveman's Comments + Replies