Comment author: Lumifer 16 July 2014 05:18:37PM *  5 points [-]

it's a question of whether you projected consciousness onto the code

Consciousness is much better projected onto tea kettles:

We put the kettle on to boil, up in the nose of the boat, and went down to the stern and pretended to take no notice of it, but set to work to get the other things out.

That is the only way to get a kettle to boil up the river. If it sees that you are waiting for it and are anxious, it will never even sing. You have to go away and begin your meal, as if you were not going to have any tea at all. You must not even look round at it. Then you will soon hear it sputtering away, mad to be made into tea.

It is a good plan, too, if you are in a great hurry, to talk very loudly to each other about how you don’t need any tea, and are not going to have any. You get near the kettle, so that it can overhear you, and then you shout out, “I don’t want any tea; do you, George?” to which George shouts back, “Oh, no, I don’t like tea; we’ll have lemonade instead – tea’s so indigestible.” Upon which the kettle boils over, and puts the stove out.

We adopted this harmless bit of trickery, and the result was that, by the time everything else was ready, the tea was waiting.

Comment author: johnswentworth 17 July 2014 04:32:44AM 2 points [-]

Exactly! More realistically, plenty of religions have projected consciousness onto things. People have made sacrifices to gods, so presumably they believed the gods could be bargained with. The greeks tried to bargain with the wind and waves, for instance.

Comment author: RichardKennaway 16 July 2014 11:44:43AM 3 points [-]

If we're going the game theory route, there's a natural definition for consciousness: something which is being modeled as a game-theoretic agent is "conscious".

So when I've set students in a Prolog class the task of writing a program to play a game such as Kayles, the code they wrote was conscious? If not, then I think you've implicitly wrapped some idea of consciousness into your idea of game-theoretic agent.

Comment author: johnswentworth 16 July 2014 04:38:57PM 1 point [-]

It's not a question of whether the code "was conscious", it's a question of whether you projected consciousness onto the code. Did you think of the code as something which could be bargained with?

Comment author: [deleted] 13 July 2014 02:49:42PM *  8 points [-]

Usually when we say "consciousness", we mean self-awareness. It's a phenomenon of our cognition that we can't explain yet, we believe it does causal work, and if it's identical with self-awareness, it might be why we're having this conversation.

I personally don't think it has much to do with moral worth, actually. It's very warm-and-fuzzy to say we ought to place moral value on all conscious creatures, but I actually believe that a proper solution to ethics is going to dissolve the concept of "moral worth" into some components like (blatantly making names up here) "decision-theoretic empathy" (agents and instances where it's rational for me to acausally cooperate), "altruism" (using my models of others' values as a direct component of my own values, often derived from actual psychological empathy), and even "love" (outright personal attachment to another agent for my own reasons -- and we'd usually say love should imply altruism).

So we might want to be altruistic towards chickens, but I personally don't think chickens possess some magical valence that stops them from being "made of atoms I can use for something else", other than the general fact that I feel some very low level of altruism and empathy towards chickens. Or, to argue Timelessly, we might say that I ought to operate with some level of altruism for the general class of minds like mine, which includes most Earth-based animals, since the foundations of our cognitive architectures evolved very, very slowly (and often in parallel shapes, under similar selection pressures); certainly I personally generally feel a moral impulse to leave Nature alone, since I cannot treat with most of it as one equal being to another.

Consciousness definitely exists, but I think it's worth not treating it as magic.

Comment author: johnswentworth 16 July 2014 04:27:49AM 2 points [-]

If we're going the game theory route, there's a natural definition for consciousness: something which is being modeled as a game-theoretic agent is "conscious". We start projecting consciousness the moment we start modelling something as an agent in a game, i.e. predicting that it will choose its actions to achieve some objective in a manner dependent on another agent's actions. In short, "conscious" things are things which can be bargained with.

This has a bunch of interesting/useful ramifications. First, consciousness is inherently a thing which we project. Consciousness is relative: a powerful AI might find humans so simple and mechanistic that there is no need to model them as agents. Consciousness is a useful distinction for developing a sustainable morality, since you can expect conscious things to follow tit-for-tat, make deals, seek retribution, and all those other nice game-theoretical things. I care about the "happiness" of conscious things because I know they'll seek to maximize it, and I can use that. I expect conscious things to care about my own "happiness" for the same reason.

This intersects somewhat with self-awareness. A game-theoretic agent must, at the very least, have a model of their partner(s) in the game(s). The usual game-theoretic model is largely black-box, so the interior complexity of the partner is not important. The partners may have some specific failure modes, but for the most part they're just modeled as maximizing utility (that's why utility is useful in game theory, after all). In particular, since the model is mostly black-box, it should be relatively easy for the agent to model itself this way. Indeed, it would be very difficult for the agent to model itself any other way, since it would have to self-simulate. With a black-box self-model armed with a utility function and a few special cases, the agent can at least check its model against previous decisions easily.

So at this point, we have a thing which can interact with us, make deals and whatnot, and generally try to increase its utility. It has an agent-y model of us, and it can maybe use that same agent-y model for itself. Does this sound like our usual notion of consciousness?

Comment author: V_V 20 June 2014 08:35:22AM 3 points [-]

AI != perfectly rational agent

In response to comment by V_V on The Power of Noise
Comment author: johnswentworth 21 June 2014 02:45:09AM 1 point [-]

Ideally = perfectly rational agent

In response to The Power of Noise
Comment author: johnswentworth 20 June 2014 05:49:05AM 1 point [-]

The randomized control trial is a great example where a superintelligence actually could do better by using a non-random strategy. Ideally, an AI could take its whole prior into account and do a value of information calculation. Even if it had no useful prior, that would just mean that any method of choosing is equally "random" under the the AI's knowledge.

Comment author: zedzed 29 April 2014 01:18:54AM 4 points [-]

Avoid listening to music while trying to adsorb new information

I recently spent under $35 to get industrial earmuffs and earplugs, for a combined total of 64 db of noise reduction. Single most cost-effective investment I've made in my learning (not counting the $0 ones).

Comment author: johnswentworth 29 April 2014 02:51:22AM 3 points [-]

I recently spent $300 on noise-cancelling earbuds. I live in the middle of San Francisco, so it's pretty noisy. I'd tried earplugs and found them uncomfortable and the noise reduction unimpressive. The earbuds have been great for productivity in general (both at work and studying at home). I highly recommend them if you're in a noisy area and can afford it.

Comment author: IlyaShpitser 11 February 2014 05:27:52PM *  1 point [-]

I mean that data always comes with a space, and that restricts the density.

Sorry I am confused. Say A,B,C,D are in [0,1] segment of the real line. This doesn't really restrict anything.

For the spaces people actually deal with we have priors.

I deal with this space. I even have a paper in preparation that deals with this space! So do lots of people that worry about learning graphs from data.

On the other hand, I would be very surprised to see any other method which works in cases where the Bayesian formalism does not yield an answer.

People use variations of the FCI algorithm, which from a Bayesian point of view is a bit of a hack. The asymptopia version of FCI assumes a conditional independence oracle, and then tells you what the model is based on what the oracle says. In practice, rather than using an oracle, people do a bunch of hypothesis tests for independence.


Regarding that ugly distribution

You are being so mean to that poor distribution. You know, H1 forms a curved exponential family if A,B,C,D are discrete. That's sort of the opposite of ugly. I think it's beautiful! H1 is an instance of Thomas Richardson's ancestral graph models, with the graph:

A <-> B <-> C <-> D <-> A

Comment author: johnswentworth 12 February 2014 03:27:37AM 1 point [-]

Oh, saying A,B,C,D are in [0,1] restricts quite a bit. It eliminates distributions with support over all the reals, distributions over R^n, distributions over words starting with the letter k, distributions over Turing machines, distributions over elm trees more than 4 years old in New Hampshire, distributions over bizarre mathematical objects that I can't even think of... That's a LOT of prior information. It's a continuous space, so we can't apply a maximum entropy argument directly to find our prior. Typically we use the beta prior for [0,1] due to a symmetry argument, but that admittedly is not appropriate in all cases. On the other hand, unless you can find dependencies after running the data through the continuous equivalent of a pseudo-random number generator, you are definitely utilizing SOME additional prior information (e.g. via smoothness assumptions). When the Bayesian formalism does not yield an answer, it's usually because we don't have enough prior info to rule out stuff like that.

I think we're still talking past each other about the distributions. The Bayesian approach to this problem uses an hierarchical distribution with two levels: one specifying the distribution p[A,B,C,D | X] in terms of some parameter vector X, and the other specifying the distribution p[X]. Perhaps the notation p[A,B,C,D ; X] is more familiar? Anyway, the hypothesis H1 corresponds to a subset of possible values of X. The beautiful distribution you talk about is p[A,B,C,D | X], which can indeed be written quite elegantly as an exponential family distribution with features for each clique in the graph. Under that parameterization, X would be the lambda vector specifying the exponential model. Unfortunately, p[X] is the ugly one, and that elegant parameterization for p[A,B,C,D | X] will probably make p[X] even uglier.

It is much prettier for DAGs. In that case, we'd have one beta distribution for every possible set of inputs to each variable. X would then be the set of parameters for all those beta distributions. We'd get elegant generative models for numerical integration and life would be sunny and warm. So the simple use case for FCI is amenable to Bayesian methods. Latent variables are still a pain, though. They're fine in theory, just integrate over them when calculating the posterior, but it gets ugly fast.

Comment author: IlyaShpitser 11 February 2014 08:24:27AM *  2 points [-]

Thanks for this post.

The resulting distribution is not elegant (as far as I can tell).

In the binary case, the saturated model can be parameterized by p(S = 0) for S any non-empty subset of { a,b,c,d }. The submodel corresponding to H1 is just one where p({a,b} = 0) = p({a}=0)p({b}=0), and p({c,d} = 0) = p({c}=0)p({d}=0).

For Bayesians, this problem does not involve "unrestricted densities" at all.

I am sorry, Bayesians do not get to decide what my problem is. My problem involves unrestricted densities by definition. I don't think you get to keep your "fully general formalism" chops if you suddenly start redefining my problem for me.

how does one test for independence between both pairs simultaneously without assuming that the events (A independent of B) and (C independent of D) are independent?

This is a good question. I don't know a good answer to this that does not involve dealing with the likelihood in some way.

Comment author: johnswentworth 11 February 2014 05:03:52PM *  3 points [-]

Sorry, I didn't mean to be dismissive of the general densities requirement. I mean that data always comes with a space, and that restricts the density. We could consider our densities completely general to begin with, but as soon as you give me data to test, I'm going to look at it and say "Ok, this is binary?" or "Ok, these are positive reals?" or something. The space gives the prior model. Without that information, there is no Bayesian answer.

I guess you could say that this isn't fully general because we don't have a unique prior for every possible space, which is a very valid point. For the spaces people actually deal with we have priors, and Jaynes would probably argue that any space of practical importance can be constructed as the limit of some discrete space. I'd say it's not completely general, because we don't have good ways of deriving the priors when symmetry and maximum entropy are insufficient. The Bayesian formalism will also fail in cases where the priors are non-normalizable, which is basically the formalism saying "Not enough information."

On the other hand, I would be very surprised to see any other method which works in cases where the Bayesian formalism does not yield an answer. I would expect such methods to rely on additional information which would yield a proper prior.

Regarding that ugly distribution, that parameterization is basically where the constraints came from. Remember that the Dirichlets are distributions on the p's themselves, so it's an hierarchical model. So yes, it's not hard to right down the subspace corresponding to that submodel, but actually doing an update on the meta-level distribution over that subspace is painful.

Comment author: IlyaShpitser 10 February 2014 02:33:33PM *  3 points [-]

Say I am interested in distinguishing between two hypotheses for p(a,b,c,d) (otherwise unrestricted):

hypothesis 1: "A is independent of B, C is independent of D, and nothing else is true"

hypothesis 2: "no independences hold"

Frequentists can run their non-parametric marginal independence tests. What is the (a?) Bayesian procedure here? As far as I can tell, for unrestricted densities p(a,b,c,d) no one knows how to write down the likelihood for H1. You can do a standard Bayesian setup here in some cases, e.g. if p(a,b,c,d) is multivariate normal, in which case H1 corresponds to a (simple) Gaussian ancestral graph model. Maybe one can do some non-parametric Bayes thing (???). It's not so simple to set up the right model sometimes, which is what Bayesian methods generally need.

Comment author: johnswentworth 11 February 2014 05:56:14AM *  3 points [-]

You should check out chapter 20 of Jaynes' Probability Theory, which talks about Bayesian model comparison.

We wish to calculate P[H1 | data] / P[H2 | data] = P[data | H1] / P[data | H2] * P[H1] / P[H2].

For Bayesians, this problem does not involve "unrestricted densities" at all. We are given some data and presumably we know the space from which it was drawn (e.g. binary, categorical, reals...). That alone specifies a unique model distribution. For discrete data, symmetry arguments mandate a Dirichlet model prior with the categories given by all possible outcomes of {A,B,C,D}. For H2, the Dirichlet parameters are updated in the usual fashion and P[data | H2] calculated accordingly.

For H1, our Dirichlet prior is further restricted according to the independencies. The resulting distribution is not elegant (as far as I can tell), but it does exist and can be updated. For example, if the variables are all binary, then the Dirichlet for H2 has 16 categories. We'll call the 16 frequencies X0000, X0001, X0010, ... with parameters a0000, a0001, ... where the XABCD are the probabilities which the model given by X assigns to each outcome. Already, the Dirichlet for H2 is constrained to {X | sum(X) = 1, X > 0} within R^16. The Dirichlet for H1 is exactly the same function, but further constrained to the space {X | sum(X) = 1, X > 0, X00.. / X10.. = X01.. / X11.., X..00 / X10.. = X..01 / X..11} within R^16. This is probably painful to work with (analytically at the very least), but is fine in principle.

So we have P[data | H1] and P[data | H2]. That just leaves the prior probabilities for each model. At first glance, it might seem that H1 has zero prior, since it corresponds to a measure-zero subset of H2. But really, we must have SOME prior information lending H1 a nonzero prior probability or we wouldn't bother comparing the two in the first place. Beyond that, we'd have to come up with reasonable probabilities based on whatever prior information we have. Given no other information besides the fact that we're comparing the two, it would be 50/50.

Of course this is all completely unscalable. Fortunately, we can throw away information to save computation. More specifically, we can discretize and bin things much like we would for simple marginal independence tests. While it won't yield the ideal Bayesian result, it is still the ideal result given only the binned data.

I am a bit curious about the non-parametric tests used for H1. I am familiar with tests for whether A and B are independent, and of course they can be applied between C and D, but how does one test for independence between both pairs simultaneously without assuming that the events (A independent of B) and (C independent of D) are independent? It is precisely this difficulty which makes the Bayesian likelihood calculation of H1 such a mess, and I am curious how frequentist methods approach it.

My apologies for the truly awful typesetting, but this is not the evening on which I learn to integrate tex in lesswrong posts.

Comment author: johnswentworth 26 January 2014 06:06:44PM 1 point [-]

Interesting article, but I think there's a more useful conclusion to draw from the idea that major problems in technical fields are precisely those too difficult for most people to solve. While a little extra intelligence can make the difference, it's also likely that an unusual tool set will make the difference. The key is that you have to try something that nobody has tried before. If you want to solve big problems in technical fields, then your biggest relative advantage should come from studying a wide variety of other fields, in search of methods that can generalize to any of the problems you're interested in. On the other side of the equation, you need to know about lots of big problems, so you can try all your methods on a wide variety of problems.

In short, the key to solving hard problems isn't just more intelligence. The key is relative advantage. Intelligence can provide one relative advantage. Rationality and Bayesian probability/statistics can provide another, as Metamed is demonstrating. And of course, we need to find problems where our relative advantages give us the most leverage.

View more: Prev | Next