Written in response to this David Deutsch presentation. Hoping it will be comprehensible enough to the friend it was written for to be responded to, and maybe a few other people too.
Deutsch says things like "theories don't have probabilities", ("there's no such thing as the probability of it") (content warning: every bayesian who watches the following two minutes will hate it)
I think it's fairly clear from this that he doesn't have solomonoff induction internalized, he doesn't know how many of his objection to bayesian metaphysics it answers. In this case, I don't think he has practiced a method of holding multiple possible theories and acting with reasonable uncertainty over all them. That probably would sound like a good thing to do to most popperians, but they often seem to have the wrong attitudes about how (collective) induction happens and might not be prepared to do it;
I am getting the sense that critrats frequently engage in a terrible Strong Opinionatedness where they let themselves wholely believe probably wrong theories in the expectation that this will add up to a productive intellectual ecosystem, I've mentioned this before, I think they attribute too much of the inductive process to blind selection and evolution, and underrecognise the major accelerants of that that we've developed, the extraordinarily sophisticated, to extend a metaphor, managed mutation, sexual reproduction, and to depart from the metaphor, conscious, judicious, uncertain but principled design, that the discursive subjects engage in, that is now primarily driving it.
He generally seems to have missed some sort of developmental window for learning bayesian metaphysics or something, the reason he thinks it doesn't work is that he visibly hasn't tied together a complete sense of the way it's supposed to. Can he please study the solomonoff inductor and think more about how priors fade away as evidence comes in, and about the inherent subjectivity a person's judgements must necessarily have as a consequence of their knowing different subsets of the evidencebase, and how there is no alternative to that. He is reaching towards a kind of objectivity about probabilities that finite beings cannot attain.
His discussion of the alignment problem defies essential decision theory, he thinks that values are like tools, that they can weaken their holders if they are in some sense 'incorrect'. That Right Makes Might. Essentially Landian worship of Omuhundro's Monster from a more optimistic angle, that the monster who rises at the end of a long descent into value drift will resemble a liberal society that we would want to build.
Despite this, his conclusion that a correct alignment process must have a value learning stage agrees with what the people who have internalised decision theory are generally trying to do (Stuart Russel's moral uncertainty and active value learning, MIRI's CEV process). I'm not sure who this is all for! Maybe it's just a point for his own students? Or for governments and their defense technology programmes, who may be thinking not enough, but when they do think, they would tend to prefer to think in terms of national character, and liberal progress? So, might that be why we need Deutsch? To speak of cosmopolitan, self-correcting approaches to AGI alignment in those fairly ill-suited terms, for the benefit of powers who will not see it in the terms of an engineering problem?
I would like to ask him if he maintains a distinction between values and preferences, morality and (well formed) desire. I prefer schools that don't. But I've never asked those who do whether they have a precise account of what moral values are, as a distinct entity from desires, maybe they have a good and useful account of values, where they somehow reliably serve the aggregate of our desires, that they just never explain because they think everyone knows it intuitively, or something. I don't. They seem too messy to prove correctness of.
Error: Prediction that humans may have time to integrate AGI-inspired mental augmentation horse exoskeletons in the short span of time between the creation of AGI and its accidental release and ascension. Neuralink will be useful, but not for that. We are stones milling about at the base of what we should infer to be a great mountain of increasing capability, and as soon as we learn to make an agent that can climb the mountain at all it will strengthen beyond our ken long before we can begin to figure out where to even plug our prototype cognitive orthotics in.
I think quite a lot of this might be a reaction to illiberal readings of Bostrom's Black Ball paper (he references it pretty clearly)... I don't know if anyone has outwardly posed such readings. Bostrom doesn't really seem eager to go there and wrestle with the governance implications himself? (one such implication: a transparent society of mass surveillance. Another: The period of the long reflection, a calm period of relative stasis), but it's understandable that Deutsch would want to engage it anyway even if nobody's vocalizing it, it's definitely a response that is lurking there.
The point about how a complete cessation of the emergence of new extinction risks would be much less beautiful than an infinite but finitely convergently decreasing series of risks, is interesting.. I'm not convinced that those societies are going to turn out to look all that different in practice..? But I'll try to carry it with me.
Yes, knowledge creation is an unending, iterative process. It could only end if we come to the big objective truth, but that can't happen (the argument for why is in BoI - the beginning of infinity).
I think this is true of any two *rational* people with sufficient knowledge, and it's rationality not bayesians that's important. If two partially *irrational* bayesians talk, then there's no reason to think they'd reach agreement on ~everything.
There is a subtle case with regards to creative thought, though: take two people who agree on ~everything. One of them has an idea, they now don't agree on ~everything (but can get back to that state by talking more).
WRT "sufficient knowledge": the two ppl need methods of discussing which are rational, and rational ways to resolve disagreements and impasse chains. they also need attitudes about solving problems. namely that any problem they run into in the discussion is able to be solved and that one or both of them can come up with ways to deal with *any* problem when it arises.
If it were meaningless I wouldn't have had to add "in an absolute sense". Just because an explanation is wrong in an *absolute* sense (i.e. it doesn't perfectly match reality) does not mean it's not *useful*. Fallibilism generally says it's okay to believe things that are false (which all explanations are in some case); however, there are conditions on those times like there are no known unanswered criticisms and no alternatives.
Since BoI there has been more work on this problem and the reasoning around when to call something "true" (practically speaking) has improved - I think. Particularly:
I think he's in a tough spot to try and explain complex, subtle relationships in epistemology using a language where the words and grammar have been developed, in part, to be compatible with previous, incorrect epistemologies.
I don't think he defines things poorly (at least typically); and would acknowledge an incomplete/fuzzy definition if he provided one. (Note: one counterexample is enough to refute this claim I'm making)
I think you misunderstand me.
let's say you wanted a pet, we need to make a conjecture about what to buy you that will make you happy (hopefully without developing regret later). the possible set of pets to start with are all the things that anyone has ever called a pet.
with something like this there will be lots of other goals, background goals, which we need to satisfy but don't normally list. An example is that the pet doesn't kill you, so we remove snakes, elephants, other other things that might hurt you. there are other background goals like life of the pet or ongoing cost; adopting you a cat with operable cancer isn't a good solution.
there are maybe other practical goals too, like it should be an animal (no pet rocks), should be fluffy (so no fish, etc), shouldn't cost more than $100, and yearly cost is under $1000 (excluding medical but you get health insurance for that).
maybe we do this sort of refinement a bit more and get a list like: cat, dog, rabbit, mouse
you might be *happy* with any of them, but can you be *more happy* with one than any other; is there a *best* pet? **note: this is not an optimisation problem** b/c we're not turning every solution into a single unit (e.g. your 'happiness index'); we're providing *decisive reasons* for why an option should or shouldn't be included. We've also been using this term "happy" but it's more than just that, it's got other important things in there -- the important thing, though, is that it's your *preference* and it matches that (i.e. each of the goals we introduce are in fact goals of yours; put another way: the conditions we introduce correspond directly and accurately to a goal)
this is the sort of case where there is there's no gun to anyone's head, but we can continue to refine down to a list of exactly **one** option (or zero). let's say you wanted an animal you could easily play with -> then rabbit,mouse are excluded, so we have options: cat,dog. If you'd prefer an animal that wasn't a predator - both cat,dog excluded and we get to zero (so we need to come up with new options or remove a goal). If instead you wanted a pet that you could easily train to use a litter tray, well we can exclude a dog so you're down to one. Let's say the litter tray is the condition you imposed.
What happens if I remember ferrets can be pets and I suggest that? well now we need a *new* goal to find which of the cat or ferret you'd prefer.
Note: for most things we don't go to this level of detail b/c we don't need to; like if you have multiple apps to choose from that satisfy all your goals you can just choose one. If you find out a reason it's not good, then you've added a new goal (if you weren't originally mistaken, that is) and can go back to the list of other options.
Note 2: The method and framework I've just used wrt the pet problem is something called yes/no philosophy and has been developed by Elliot Temple over the past ~10+ years. Here are some links:
Argument · Yes or No Philosophy, Curiosity – Rejecting Gradations of Certainty, Curiosity – Critical Rationalism Epistemology Explanations, Curiosity – Critical Preferences and Strong Arguments, Curiosity – Rationally Resolving Conflicts of Ideas, Curiosity – Explaining Popper on Fallible Scientific Knowledge, Curiosity – Yes or No Philosophy Discussion with Andrew Crawshaw
Note 3: During the link-finding exercise I found this: "All ideas are either true or false and should be judged as refuted or non-refuted and not given any other status – see yes no philosophy." (credit: Alan Forrester) I think this is a good way to look at it; *technically and epistemically speaking:* true/false is not a judgement we can make, but refuted/non-refuted *is*. we use refuted/non-refuted as a proxy for false/true when making decisions, because (as fallible beings) we cannot do any better than that.
I'm curious about how a bayesian would tackle that problem. Do you just stop somewhere and say "the cat has a higher probability so we'll go with that?" Do you introduce goals like I did to eliminate options? Is the elimination of those options equivalent to something like: reducing the probability of those options being true to near-zero? (or absolute zero?) Can a bayesian use this method to eliminate options without doing probability stuff? If a bayesian *can*, what if I conjecture that it's possible to *always* do it for *all* problems? If that's the case there would be a way to decisively reach a single answer - so no need for probability. (There's always the edge case there was a mistake somewhere, but I don't think there's a meaningful answer to problems like "P(a mistake in a particular chain of reasoning)" or "P(the impact of a mistake is that the solution we came to changes)" -- note: those P(__) statements are within a well defined context like an exact and particular chain of reasoning/explanation.
So we can make decisions.
Yes you do - you need a theory of expected utility; how to measure it, predict it, manipulate it, etc. You also need a theory of how to use things (b/c my expected utility of amazing tech I don't know how to use is 0). You need to believe these theories are true, otherwise you have no way to calculate a meaningful value for expected utility!
Yes, I additionally claim we can operate **decisively**.
It matters more for big things, like SENS and MIRI. Both are working on things other than key problems; there is no good reason to think they'll make significant progress b/c there are other more foundational problems.
I agree practically a lot of decisions come out the same.
I don't know why they would be risible -- nobody has a good reason why his ideas are wrong to my knowledge. They refute a lot of the fear-mongering that happens about AGI. They provide reasons for why a paperclip machine isn't going to turn all matter into paperclips. They're important because they refute big parts of theories from thinkers like Bostrom. That's important because time, money, and effort are being spent in the course of taking Bostrom's theories seriously, even though we have good reasons they're not true. That could be time, money, and effort spent on more important problems like figuring out how creativity works. That's a problem which would actually lead to the creation of an AGI.
Calling unanswered criticisms *risible* seems irrational to me. Sure unexpected answers could be funny the first time you hear them (though this just sounds like ppl being mean, not like it was the punchline to some untold joke) but if someone makes a serious point and you dismiss it because you think it's silly, then you're either irrational or you have a good, robust reason it's not true.
He doesn't claim this at all. From memory the full argument is in Ch7 of BoI (though has dependencies on some/all of the content in the first 6 chapters, and some subtleties are elaborated on later in the book). He expressly deals with the case where an AGI can run like 20,000x faster than a human (i.e. arbitrarily fast). He also doesn't presume it needs to be raised like a human child or take the same resources/attention/etc.
Have you read much of BoI?