Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: ScottL 18 August 2015 06:01:03AM 1 point [-]

I updated the first example to one that is similar to the one above by Tyrrell_McAllister. Can you please let me know if it solves the issues you had with the original example.

Comment author: Kaj_Sotala 19 August 2015 02:48:53PM 1 point [-]

That does look better! Though since I can't look at it with fresh eyes, I can't say how I'd interpret it if I were to see it for the first time now.

Comment author: ScottL 18 August 2015 02:43:40AM *  0 points [-]

I'd guess that getting this question "correct" almost requires having been trained to parse the problem in a certain formal way — namely, purely in terms of propositional logic.

To get the question correct you just need to consider the falsity of the premises. You don't neccesarily have to parse the problem in a fromal way, although that would help.

On this reading, Ace is most probable.

Ace is not more probable. It is imposible to have an ace in the dealt hand due to the requiement that only one of the premises is true. The basic idea is that one of the premises must be false which means that an ace is impossible. It is impossible because if an ace is in the dealt hand, then this means that both premises are true which violates the requirement (Exactly one of these statements is true). I have explained this further in this post

Comment author: Kaj_Sotala 18 August 2015 04:03:39AM *  4 points [-]

I think the problem here is that you're talking to people who have been trained to think in terms of probabilities and probability trees, and furthermore, asking "what is more likely" automatically primes people to think in terms of a probability tree.

The way I originally thought about this was:

  • Suppose premise 1 is true. Then two possible combinations out of three might contain a king, so 2/3 probability for a king, and since I guess we're supposed to assume that premise 1 has a 50% probability, then that means a king has a 2/6 = 1/3 probability overall. By the same logic, ace has a 2/3 probability in this branch, for a 1/3 probability overall.
  • Now suppose that premise 2 is true. By the same logic as above, this branch contributes an additional 1/3 to the ace's probability mass. But this branch has no king, so the king acquires no probability mass.
  • Thus the chance of an ace is 2/3 and the chance of a king is 1/3.

In other words, I interpreted the "only one of the following premises is true" as "each of these two premises has a 50% probability", to a large extent because the question of likeliness primed me to think in terms of probability trees, not logical possibilities.

Arguably, more careful thought would have suggested that possibly I shouldn't think of this as a probability tree, since you never specified the relative probabilities of the premises, and giving them some relative probability was necessary for building the probability tree. On the other hand, in informal probability puzzles, it's often common to assume that if we're picking one option out of a set of N options, then each option has a probability of 1/N unless otherwise stated. Thus, this wording is ambiguous.

In one sense, me interpreting the problem in these terms could be taken to support the claims of model theory - after all, I was focusing on only one possible model at a time, and failed to properly consider their conjunction. But on the other hand, it's also known that people tend to interpret things in the framework they've been taught to interpret them, and to use the context to guide their choice of the appropriate framework in the case of ambiguous wording. Here the context was the use of word of the "likely", guiding the choice towards the probability tree framework. So I would claim that this example alone isn't sufficient to distinguish between whether a person reading it gives the incorrect answer because of the predictions of model theory alone, or whether because the person misinterpreted the intent of the wording.

Comment author: Kaj_Sotala 18 August 2015 03:30:37AM 3 points [-]

Great article! A concrete demonstration of some of the ways we reason, and how that might lead to various fallacies.

It might have been better to make it into a series of shorter posts - as it was, I initially had the same misreading about the "King or Ace" question as alicey did (even after the rewrite) and stopped to spend considerable time thinking about it and your claim that King was more likely, after which I had enough mental fatigue that I didn't feel like going through the rest of the examples anymore. But I intend to return to this article later to finish reading it.

Comment author: cousin_it 04 August 2015 07:28:24PM *  6 points [-]

Wikipedia on Chalmers, consciousness, and zombies:

Chalmers argues that since such zombies are conceivable to us, they must therefore be logically possible. Since they are logically possible, then qualia and sentience are not fully explained by physical properties alone.

That kind of reasoning allows me to prove so many exciting things! I can imagine a world where gravity is Newtonian but orbits aren't elliptical (my math skills are poor but my imagination is top notch), therefore Newtonian gravity cannot explain elliptical orbits. And so on.

Am I being a hubristic idiot for thinking I can disprove a famous philosopher so casually?

Comment author: Kaj_Sotala 06 August 2015 12:18:38PM 3 points [-]

My default assumption is that if someone smart says something that sounds obviously false to me, either they're giving their words different meanings than I am, or alternatively the two-sentence version is skipping a lot of inferential steps.

Compare the cautionary tale of talking snakes.

Comment author: Elo 31 July 2015 04:52:46AM *  4 points [-]

I travelled to a different city for a period of a few days and realised I should actively avoid trying to gather geographical information (above a rough sense) to free up my brain space for more important things. Then I realised I should do that near home as well.

Two part question:-

  1. What do you outsource that is common and uncommon among people that you know?
  2. What should you be avoiding keeping in your brain that you currently are? (some examples might be birthdays, what day of the week it is, city-map-location, schedules/calendars, task lists, shopping lists)

And while we are at it: What automated systems have you set up?

Comment author: Kaj_Sotala 04 August 2015 12:45:44PM 4 points [-]

I was under the impression that "brain space" was unlimited for all practical intents and purposes, and that having more stuff in your brain might actually even make extra learning easier - e.g. I've often heard it said that a person loses fluid intelligence when they age, but this is compensated by them having more knowledge that they can connect new things with. Do you know of studies to the contrary?

Comment author: So8res 30 July 2015 05:08:23PM *  10 points [-]

Thanks for the reply, Jacob! You make some good points.

Why not test safety long before the system is superintelligent? - say when it is a population of 100 child like AGIs. As the population grows larger and more intelligent, the safest designs are propagated and made safer.

I endorse eli_sennesh's response to this part :-)

This again reflects the old 'hard' computer science worldview, and obsession with exact solutions.

I am not under the impression that there are "exact solutions" available, here. For example, in the case of "building world-models," you can't even get "exact" solutions using AIXI (which does Bayesian inference using a simplicity prior in order to guess what the environment looks like; and can never figure it out exactly). And this is in the simplified setting where AIXI is large enough to contain all possible environments! We, by contrast, need to understand algorithms which allow you to build a world model of the world that you're inside of; exact solutions are clearly off the table (and, as eli_sennesh notes, huge amounts of statistical modeling are on it instead).

I would readily accept a statistical-modeling-heavy answer to the question of "but how do you build multi-level world-models from percepts, in principle?"; and indeed, I'd be astonished if you avoided it.

Perhaps you read "we need to know how to do X in principle before we do it in practice" as "we need a perfect algorithm that gives you bit-exact solutions to X"? That's an understandable reading; my apologies. Let me assure you again that we're not under the illusion you can get bit-exact solutions to most of the problems we're working on.

For example - perhaps using lots and lots of computing power makes the problem harder instead of easier. How could that be? Because with lots and lots of compute power, you are naturally trying to extrapolate the world model far far into the future, where it branches enormously [...]

Hmm. If you have lots and lots of computing power, you can always just... not use it. It's not clear to me how additional computing power can make the problem harder -- at worst, it can make the problem no easier. I agree, though, that algorithms for modeling the world from the inside can't just extrapolate arbitrarily, on pain of exponential complexity; so whatever it takes to build and use multi-level world-models, it can't be that.

Perhaps the point where we disagree is that you think these hurdles suggest that figuring out how to do things we can't yet do in principle is hopeless, whereas I'm under the impression that these shortcomings highlight places where we're still confused?

In response to comment by So8res on MIRI's Approach
Comment author: Kaj_Sotala 02 August 2015 11:28:18PM *  8 points [-]

Hmm. If you have lots and lots of computing power, you can always just... not use it. It's not clear to me how additional computing power can make the problem harder -- at worst, it can make the problem no easier.

Additional computing power might not make the problem literally harder, but the assumption of limitless computing power might direct your attention towards wrong parts of the search space.

For example, I suspect that the whole question about multilevel world-models might be something that arises from conceptualizing intelligence as something like AIXI, which implicitly assumes that there's only one true model of the world. It can do this because it has infinite computing power and can just replace its high-level representation of the world with one where all high-level predictions are derived from the basic atom-level interactions, something that would be intractable for any real-world system to do. Instead real-world systems will need to flexibly switch between different kinds of models depending on the needs of the situation, and use lower-level models in situations where the extra precision is worth the expense of extra computing time. Furthermore, those lower-level models will have been defined in terms of what furthers the system's goals, as defined on the higher-levels: it will pay preferential attention to those features of the lower-level model that allow it to further its higher-level goals.

In the AIXI framing, the question of multilevel world-models is "what happens when the AI realizes that the true world model doesn't contain carbon atoms as an ontological primitive". In the resource-limited framing, that whole question isn't even coherent, because the system has no such thing as a single true world-model. Instead the resource-limited version of how to get multilevel world-models to work is something like "how to reliably ensure that the AI will create a set of world models in which the appropriate configuration of subatomic objects in the subatomic model gets mapped to the concept of carbon atoms in the higher-level model, while the AI's utility function continues to evaluate outcomes in terms of this concept regardless of whether it's using the lower- or higher-level representation of it".

As an aside, this reframed version seems like the kind of question that you would need to solve in order to have any kind of AGI in the first place, and one which experimental machine learning work would seem the best suited for, so I'd assume it to get naturally solved by AGI researchers even if they weren't directly concerned with AI risk.

In response to comment by LawrenceC on MIRI's Approach
Comment author: So8res 30 July 2015 09:55:29PM 9 points [-]

It's cited a lot in MIRI's writing because it's the first example that pops to my mind, and I'm the one who wrote all the writings where it appears :-p

For other examples, see maybe "Artificial Evolution in the Physical World" (Thompson, 1997) or "Computational Genetics, Physiology, Metabolism, Neural Systems, Learning, Vision, and Behavior or PolyWorld: Life in a New Context." (Yaeger, 1994). IIRC.

In response to comment by So8res on MIRI's Approach
Comment author: Kaj_Sotala 31 July 2015 08:58:59AM *  7 points [-]

Note that always only citing one example easily gives the impression that it's the only example you know of, or of this being an isolated special case, so at least briefly mentioning the existence of others could be better.

Comment author: Stuart_Armstrong 15 July 2015 09:53:31AM 0 points [-]

Because FAI's can change themselves very effectively in ways that we can't.

It might be that human brain in computer software would have the same issues.

Comment author: Kaj_Sotala 15 July 2015 01:02:16PM *  2 points [-]

Because FAI's can change themselves very effectively in ways that we can't.

Doesn't mean the FAI couldn't remain genuinely uncertain about some value question, or consider it not worth solving at this time, or run into new value questions due to changed circumstances, etc.

All of those could prevent reflective equilibria, while still being compatible with the ability for extensive self-modification.

Comment author: shminux 13 July 2015 03:15:44PM 10 points [-]

As you mention, so far every attempt by humans to have a self-consistent value system (the process also known as decompartmentalization) results in less-than-desirable outcomes. What if the end goal of having a thriving long-lasting (super-)human(-like) society is self-contradictory, and there is no such thing as both "nice" and "self-referentially stable"? Maybe some effort should be put into figuring out how to live, and thrive, while managing the unstable self-reference and possibly avoid convergence altogether.

Comment author: Kaj_Sotala 15 July 2015 06:59:46AM 10 points [-]

A thought I've been thinking of lately, derived from a reinforcement learning view of values, and also somewhat inspired by Nate's recent post on resting in motion... - value convergence seems to suggest a static endpoint, with some set of "ultimate values" we'll eventually reach and have ever after. But so far societies have never reached such a point, and if our values are an adaptation to our environment (including the society and culture we live in), then it would suggest that as long as we keep evolving and developing, our values will keep changing and evolving with us, without there being any meaningful endpoint.

There will always (given our current understanding of physics) be only a finite amount of resources available, and unless we either all merge into one enormous hivemind or get turned into paperclips, there will likely be various agents with differing preferences on what exactly to do with those resources. As the population keeps changing and evolving, the various agents will keep acquiring new kinds of values, and society will keep rearranging itself to a new compromise between all those different values. (See: the whole history of the human species so far.)

Possibly we shouldn't so much try to figure out what we'd prefer the final state to look like, but rather what we'd prefer the overall process to look like.

(The bias towards trying to figure out a convergent end-result for morality might have come from LW's historical tendency to talk and think in terms of utility functions, which implicitly assume a static and unchanging set of preferences, glossing over the fact that human preferences keep constantly changing.)

Comment author: Wei_Dai 14 July 2015 03:39:50AM 2 points [-]

Anyone know more about this proposal from IDSIA?

Technical Abstract: "Whenever one wants to verify that a recursively self-improving system will robustly remain benevolent, the prevailing tendency is to look towards formal proof techniques, which however have several issues: (1) Proofs rely on idealized assumptions that inaccurately and incompletely describe the real world and the constraints we mean to impose. (2) Proof-based self-modifying systems run into logical obstacles due to Löb's theorem, causing them to progressively lose trust in future selves or offspring. (3) Finding nontrivial candidates for provably beneficial self-modifications requires either tremendous foresight or intractable search.

Recently a class of AGI-aspiring systems that we call experience-based AI (EXPAI) has emerged, which fix/circumvent/trivialize these issue. They are self-improving systems that make tentative, additive, reversible, very fine-grained modifications, without prior self-reasoning; instead, self-modifications are tested over time against experiential evidences and slowly phased in when vindicated or dismissed when falsified. We expect EXPAI to have high impact due to its practicality and tractability. Therefore we must now study how EXPAI implementations can be molded and tested during their early growth period to ensure their robust adherence to benevolence constraints.

I did some searching but Google doesn't seem to know anything about this "EXPAI".

Comment author: Kaj_Sotala 14 July 2015 06:29:15AM 2 points [-]

I didn't find anything on EXPAI either, but there's the PI's list of previous publications. At least his Bounded Seed-AGI paper sounds somewhat related:

Abstract. Four principal features of autonomous control systems are left both unaddressed and unaddressable by present-day engineering methodologies: (1) The ability to operate effectively in environments that are only partially known at design time; (2) A level of generality that allows a system to re-assess and redefine the fulfillment of its mission in light of unexpected constraints or other unforeseen changes in the environment; (3) The ability to operate effectively in environments of significant complexity; and (4) The ability to degrade gracefully— how it can continue striving to achieve its main goals when resources become scarce, or in light of other expected or unexpected constraining factors that impede its progress. We describe new methodological and engineering principles for addressing these shortcomings, that we have used to design a machine that becomes increasingly better at behaving in underspecified circumstances, in a goal-directed way, on the job, by modeling itself and its environment as experience accumulates. The work provides an architectural blueprint for constructing systems with high levels of operational autonomy in underspecified circumstances, starting from only a small amount of designer-specified code—a seed. Using value-driven dynamic priority scheduling to control the parallel execution of a vast number of lines of reasoning, the system accumulates increasingly useful models of its experience, resulting in recursive self-improvement that can be autonomously sustained after the machine leaves the lab, within the boundaries imposed by its designers. A prototype system named AERA has been implemented and demonstrated to learn a complex real-world task—real-time multimodal dialogue with humans—by on-line observation. Our work presents solutions to several challenges that must be solved for achieving artificial general intelligence.

View more: Next