How many people here agree with Holden? [Actually, who agrees with Holden?]

private_messaging

LESSWRONG
LW

3 How many people here agree with Holden? [Actually, who agrees with Holden?]

by private_messaging

14th May 2012

1 min read

106

3

I was wondering - what fraction of people here agree with Holden's advice regarding donations, and his arguments? What fraction assumes there is a good chance he is essentially correct? What fraction finds it necessary to determine whenever Holden is essentially correct in his assessment, before working on counter argumentation, acknowledging that such investigation should be able to result in dissolution or suspension of SI?

It would seem to me, from the response, that the chosen course of action is to try to improve the presentation of the argument, rather than to try to verify truth values of the assertions (with the non-negligible likelihood of assertions being found false instead). This strikes me as very odd stance.

Ultimately: why SI seems certain that it has badly presented some valid reasoning, rather than tried to present some invalid reasoning?

edit: I am interested in knowing why people agree/disagree with Holden, and what likehood they give to him being essentially correct, rather than a number or a ratio (that would be subject to selection bias).

New to LessWrong?

3

How many people here agree with Holden? [Actually, who agrees with Holden?]

New Comment

106 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:36 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Emile13y340

I think most people on this site (including me and you, private messaging/Dmytry) don't have any particular insight that gives them more information than those who seriously thought about this for a long time (like Eliezer, Ben Goertzel, Robin Hanson, Holden Karnofsky, Lukeprog, possibly Wei Dai, cousin_it, etc.), so our opinion on "who is right" is not worth much.

I'd much rather see an attempt to cleanly map out where knowledgeable people disagree, rather than polls of what ignorant people like me think.

Similarly, if two senior economists have a public disagreement about international trade and fiscal policy, a poll of a bunch of graduate students on those issues is not going to provide much new information to either economist.

(I don't really know how to phrase this argument cleanly, help and suggestions welcome, I'm just trying to retranscribe my general feeling of "I don't even know enough to answer, and I suspect neither to most people here")

[-]falenas10813y120

(I don't really know how to phrase this argument cleanly, help and suggestions welcome, I'm just trying to retranscribe my general feeling of "I don't even know enough to answer, and I suspect neither to most people here")

I would phrase it as holding off judgement until we hear further information, i.e. SI's response to this. And in addition to the reasons you give, not deciding who's right ahead of time helps us avoid becoming attached to one side.

8Emile13y

I think what's needed isn't further information as much as better intuitions, and getting those isn't just a matter of reading SIAI's response. A bit like if there's a big public disagreement between two primatologists that spent years working with chimps in Africa, about the best way to take a toy from a chimp without your arm getting ripped off. At least one of the primatologists is wrong, but even after hearing all of their arguments, a member of the uninformed public can't really decide between them, because there positions are based on a bunch of intuitions that are very hard to communicate. Deciding "who is wrong" based on the public debate would be working from much less information than either of the parties (provided nobody appears obviously stupid or irrational or dishonest even to a member of the public). People seem more ready to pontificate on AI and the future and morality than on chimpanzees, but I don't think we should be. The best position for laymen on a topic on which experts disagree is one of uncertainty

[-]prase13y100

The primatologists' intuitions would probably stem from their direct observations of chimps. I would trust their intuitions much less if they were based on long serious thinking about primates without any observation, which is likely the more precise analogy of the positions held in the AI risk debate.

7asr13y

AGI research is not an altogether well-defined area. There are no well-established theorems, measurements, design insights, or the like. And there is plenty of overlap with other fields, such as theoretical computer science. My impression is that many of the people commenting have enough of a computer science, engineering, or math background to be worth listening to. The LW community takes Yudkowsky seriously when he talks about quantum mechanics -- and indeed, he has cogent things to say. I think we ought to see who has something worth saying about AGI and risk.

-5private_messaging13y

0private_messaging13y

Well, I spent many years of my life studying technical topics, and have certain technical accomplishments, so it is generally a bad strategy for me to assume superior knowledge for anyone who 'thought longer' about a subject; especially if it may be the case that it is not very hard to see that nothing conclusive could be concluded about the topic at the time, or the tool (mathematics) are not yet where they should be. Furthermore, if I am to look at your list, those whom I disagree the most with (Eliezer, Lukeprog) appear to have least training in the subject matter (and jumped onto a very difficult subject without doing any notable training with good feedback). Lukeprog in particular has been doing theology till age of 22 and inevitably picked up a plenty of bad habits of thought; if I were him I would stay clear of things that can trigger old conditioned theist instincts, leaving those perhaps to people who never had such instincts conditioned into them in the first place. (I originally thought Luke was making BS but it seems to me now he is only still acting as a vehicle for the religious BS and could improve over the time)

[-]TheOtherDave13y220

If you're actually interested in the answer to the question you describe yourself as wondering about, you might consider setting up a poll.

Conversely, if you're actually interested in expressing the belief that Holden is essentially correct while phrasing it as a rhetorical question for the usual reasons, then a poll isn't at all necessary.

4private_messaging13y

Well, maybe it is poorly worded, I'd rather also know who here thinks that Holden is essentially correct. What probability would you give to Holden being essentially correct? Why?

5TheOtherDave13y

I'm going to read between the lines a little, and assume that "Holden is essentially correct" here means roughly that donating money to SI doesn't significantly reduce human existential risk. (Holden says a lot of stuff, some of which I agree with more than others.) I'm >.9 confident that's true. Holden's post hasn't significantly altered my confidence of that. Why do you want to know?

5private_messaging13y

Well, he estimated the expected effect on risk as insignificant increase of risk. That is to me the strong point; the 'does not reduce' is a weak version prone to eliciting Pascal's wager type response.

6TheOtherDave13y

I am >.9 confident that donating money to SI doesn't significantly increase human existential risk. (Edit: Which, on second read, I guess means I agree with Holden as you summarize him here. At least, the difference between "A doesn't significantly affect B" and "A insignificantly affects B" seems like a difference I ought not care about.) I also think Pascal's Wager type arguments are silly. More precisely, given how unreliable human intuition is when dealing with very low probabilities and when dealing with very large utilities/disutilities, I think lines of reasoning that rely on human intuitions about very large very-low-probability utility shifts are unlikely to be truth-preserving. Why do you want to know?

2Luke_A_Somers13y

On that, I'm pretty sure that the SI would not rush that way. Consider the parable of the dragon. This isn't the story of someone who's willing to cut corners, but of someone who accepts that delays for checking, even delays that cause people to die, are necessary. Plus, if they develop a clear enough architecture, so one can query what the AI is thinking, then one would be able to see potential future failures while still in testing, without having to have those contingencies actually occur. That will be one of the keys, I think. Make the AI's reasons something that we can follow, even if we couldn't generate those arguments on a reasonable time-frame.

[-]Shmi13y200

I agree with HK that at this point SI should not be one of the priority charities supported by GiveWell, mainly due to the lack of demonstrated progress in the stated area of AI risk evaluation. If and when SI publishes peer-reviewed papers containing new insights into the subject matter, clearly demonstrating the dangers of AGI and providing a hard-to-dispute probability estimate of the UFAI takeover within a given time frame, as well as outlining constructive ways to mitigate this risk ("solve the friendliness problem" is too vague), GiveWell should reevaluate its stance.

On the other hand, the soon-to-be-spawned Applied Rationality org will have to be evaluated on its own merits, and is likely to have easier time of meeting GiveWell's requirements, mostly because the relevant metrics (of "raising the sanity waterline") can be made so much more concrete and near-term.

6[anonymous]13y

I disagree. As far as I can tell, there has been very little progress on the rationality verification problem (see this thread). I don't think anyone at CFAR or GiveWell knows what the relevant metrics really are and how they can be compared with, say, QALYs or other approximations of utility.

2Shmi13y

First, this seems like a necessary stepping stone toward any kind of FAI-related work, and so cannot be skipped. Indeed, if you cannot tell which of two entities in front of you is more rational that the other, what hope do you have of solving a much larger problem of proven friendliness, of which proven rationality is only a tiny part. Anyway, this limited-scope project (consistently ordering people by rationality level in a specific setting) should be something rather uncontroversial and achievable.

3[anonymous]13y

It's one stepping-stone to FAI work, but other things could be substituted for it, like publishing lots and lots of well-received peer-reviewed papers. I don't think it is. What exactly is "rationality level," and how would it be measured? There's no well-defined quantity that you can scan and get a measure of someone's rationality. Even "winning" isn't that good of a metric.

2Shmi13y

This is a harder question than "which one of two given behaviors is more rational in a given setting?", that's why I suggested starting with the latter. Once you accumulate enough answers like that, you can start assembling them into a more general metric.

5[anonymous]13y

I maintain that this is a very hard problem. We know what the correct answers to various cognitive bias quizzes are (e.g. the conjunction fallacy questions questions), but it's not clear that aggregating a lot of these tests corresponds to what we really mean when we say "rationality".

3Shmi13y

Gotta start somewhere. I proposed a step that may or may not lead in the right direction, leveling a criticism that it does not solve the whole problem at once is not very productive. Even the hardest problems tend to yield to incremental approaches. If you have a better idea, by all means, suggest it.

1[anonymous]13y

I'm not trying to be negative for the sake of being negative, or even for the sake of criticizing your proposal--I was disagreeing with your prediction that CFAR will have an easier time of meeting GiveWell's requirements. (I actually like your proposal quite a bit, and I think it's an avenue that CFAR should investigate. But I still think that the verification problem is hard, and hence I predict that CFAR will not be very good at providing GiveWell with a workable rationality metric.)

4XiXiDu13y

I'd like to emphasize that part.

[-]Gastogh13y180

I found HK's analysis largely sound (based on what I could follow, anyway), but it didn't have much of an effect on my donation practices. The following outlines my reasoning for doing what I do.

I have no feasible way to evaluate SIAI's work firsthand. I couldn't do that even if their findings were publicly available, and it's my default policy to reject the idea of donating to anyone whose claims I can't understand. If donating were a purely technical question, and if it came down to nothing but my estimate of SIAI's chances of actually making groundbreaking research, I wouldn't bet on them to be the first to build an AGI, never mind a FAI. (Also, on a more cynical note, if SIAI were simply an elaborate con job instead of a genuine research effort, I honestly wouldn't expect to see much of a difference.)

However, I can accept the core arguments for fast AI and uFAI to such a degree that I think the issue needs addressing, whatever that answer turns out to be. I view the AI risk PR work SIAI does as their most important contribution to date. Even if they never publish anything again, starting today, and even if they'll never have a line of code to show for anything, I estimate their... (read more)

[-]XiXiDu13y150

I believe that SI is a valuable organisation and would be pleased if they were to keep their current level of funding.

I believe that withholding funds won't work very well and that they are rational and intelligent enough to sooner or later become aware of their shortcomings and update accordingly.

4Dan_Moore13y

I agree with this conclusion and also Karnofsky's assessment that the hypotheses currently espoused by SI about how AI will play out are very speculative.

3Rain13y

Do you feel this conflicts with opinions expressed on your blog? If not, why not?

[-]XiXiDu13y100

Do you feel this conflicts with opinions expressed on your blog? If not, why not?

Your question demands a thoughtful reply. I don't have the time to do so right now.

Maybe the following snippet from a conversation with Holden can shed some light on what is really a very complicated subject:

I even believe that SIAI, even given its shortcomings, is valuable. It makes people think, especially the AI/CS crowd, and causes debate.

I certainly do not envy you for having to decide if it is a worthwhile charity.

What I am saying is that I wouldn't mind if it kept its current funding. Although if I believed that there was even a small chance that they could be building the kind of AI that they envision, then in that case I would probably actively try to make them lose funding.

My position is probably inconsistent and highly volatile.

Just think about it this way. If you asked me if I do desire a world state where people like Eliezer Yudkowsky are able to think about AI risks, then I would say yes. If you asked me how come I wouldn't allocate the money to protect poor people against malaria, then I can only admit that I don't have a good answer. That is an extremely difficult problem.

As I said, ... (read more)

4Rain13y

Okay.

0[anonymous]13y

Do you feel these views conflict with calling their views "Bullshit!" (emphasis yours) on your blog? If not, why not?

-5private_messaging13y

[-]Rain13y150

Having only read the headline, I came to this thread with the intention of saying that I agree with much of what he said, up to and potentially including withholding further funds from SI.

But then I read the post and find it's asking a different but related question, paraphrased as, "Why doesn't SI just lay down and die now that everyone knows none of their arguments have a basis in reality?" Which I'm inclined to disagree with.

0private_messaging13y

No, what I complained about is the lack of work on SI part to actually try to check if it is correct, knowing that negative would mean that it has to dissolve. Big difference. SI should play Russian roulette (with the reality and logic as revolver) now - it is sure the bullet is not in the chamber - and maybe die if it was wrong.

1Rain13y

So you think they should work on papers, posts, and formal arguments?

-3private_messaging13y

I think they should work more on 'dissolving if their work is counter-productive', i.e. incorporate some self evaluation/feedback, which, if consistently negative would lead to not asking for any more money. To not do that makes them a scam scheme, plain and simple. ( I do not care that you truly believe here is an invisible dragon in your garage, if you never tried to, say, spread flour to see it, or otherwise check. Especially if you're the one repackaging that dragon thing for popular consumption )

1Rain13y

What SI activity constitutes the 'spreading flour' step in your analogy?

0private_messaging13y

I'm speaking of feedbacks Holden told of. In that case, the belief in own capabilities is the dragon.

1Rain13y

Yes, I understand the analogy and how it applies to SI, except the 'spreading flour' step where they test them. What actions should they take to perform the test?

3private_messaging13y

Well, for example, Eliezer can try to actually invent something technical, most likely fail (most people aren't very good at inventing), and then cut down his confidence in his predictions about AI. (and especially in intuitions because the dangerous AI is incredibly clever inventor of improvements to itself, and you'd better be a good inventor or your intuitions from internal self observation aren't worth much). On more meta level they can sit and think - how do we make sure we aren't mistaken about AI? Where could our intuitions be coming from? Are we doing something useful or have we created a system of irreducible abstractions? etc. Should have been done well before Holden's post. edit: i.e. essentially, SI is doing a lot of symbol manipulation type activity to try to think about AI. Those symbols may represent some irreducible flawed concepts, in which case manipulating them won't be of any use.

[-]vi21maobk9vp13y130

I agree with Holden and additionally it looks like AGI discussions have most of the properties of mindkilling.

These discussions are about policy. They are about policy affecting medium-to-far future. These policies cannot be founded in reliably scientific evidence. Bayesian inquiry heavily depends on priors, and there is nowhere near anough data for tipping the point.

As someone who practices programming and has studied CS, I find Hanson and AI researchers and Holden more convincing than Eliezer_Yudkowsky or lukeprog. But this is more prior-based than evidence-based. Nearly all that the arguments by both sides do is just bringing a system to your priors. I cannot judge which side gives more odds-changing data because arguments from one side make way more sense and I cannot factor out the original prior dissonance with the other side.

The arguments about "optimization done better" don't tell us anything about position of fundamental limits to each kind of optimization; with a fixed computronium type it is not clear that any kind of head start would ensure that a single instance of AI would beat an instance based on 10x computronium older than 1 week (and partitioning the wo... (read more)

[-]Zack_M_Davis13y100

relatively recent "So you want to be a Seed AI Programmer" by Eliezer_Yudkowsky [...] maybe it should be either declared obsolete in public

(I believe that document was originally written circa 2002 or 2003, the copy mirrored from the Transhumanist Wiki (which includes comments as recent as 2009) being itself a mirror. "Obsolete" seems accurate.)

[-]A4FB53AC13y130

Suppose that SI now activates its AGI, unleashing it to reshape the world as it sees fit. What will be the outcome? I believe that the probability of an unfavorable outcome - by which I mean an outcome essentially equivalent to what a UFAI would bring about - exceeds 90% in such a scenario. I believe the goal of designing a "Friendly" utility function is likely to be beyond the abilities even of the best team of humans willing to design such a function. I do not have a tight argument for why I believe this.

My immediate reaction to this was "as opposed to doing what?" In this segment it seems like it is argued that SI's work, raising awareness that not all paths to AI are safe, and that we should strive to find safer paths towards AI, is actually making it more likely that an undesirable AI / Singularity will be spawned in the future. Can someone explain me how not discussing such issues and not working on them would be safer?

Just having that bottom line unresolved in Holden's post makes me reluctant to accept the rest of the argument.

4Viliam_Bur13y

Seems to me that Holden's opinion is something like: "If you can't make the AI reliably friendly, just make it passive, so it will listen to humans instead of transforming the universe according to its own utility function. Making a passive AI is safe, but making an almost-friendly active AI is dangerous. SI is good at explaining why almost-friendly active AI is dangerous, so why don't they take the next logical step?" But from SI's point of view, this is not a solution. First, it is difficult, maybe even impossible, to make something passive and also generally intellligent and capable of recursive self-improvement. It might destroy the universe as a side effect of trying to do what it percieves as our command. Second, the more technology progresses, the relatively easier it will be to build an active AI. Even if we build a few passive AIs, it does not prevent some other individual or group to build an active AI and use it to destroy the world. Having a blueprint for a passive AI will probably make building active AI easier. (Note: I am not sure I am representing Holden's or SI's views correctly, but this is how it makes most sense to me.)

[-]AlanCrowe13y110

Artificial Intelligence dates back to 1960. Fifty years later it has failed in such a humiliating way that it was not enough to move the goal posts; the old, heavy wooden goal posts have been burned and replaced with light weight portable aluminium goal posts, suitable for celebrating such achievements as from time to time occur.

Mainstream researchers have taken the history on board and now sit at their keyboards typing in code to hand-craft individual, focused solutions to each sub-challenge. Driving a car uses drive-a-car vision. Picking a nut and bolt from a component bin has nut-and-bolt vision. There is no generic see-vision. This kind of work cannot go FOOM for deep structural reasons. All the scary AI knowledge, the kind of knowledge that the pioneers of the 1960's dreamed of, stays in the brains of the human researchers. The humans write the code. Though they use meta-programming, it is always "well-founded" in the sense that level n writes level n-1, all the way down to level 0. There is no level n code rewriting level n. That is why it cannot go FOOM.

Importantly, this restraint is enforced by a different kind of self-interest than avoiding existential risk. The... (read more)

[-]MixedNuts13y90

I agree with Holden about everything.

Edit: Not that I'm complaining, but why is this upvoted? It's rather low on content.

6CuSithBell13y

Possibly it's upvoted to encourage responses to the post - that is, it's high-content relative to not posting?

8thomblake13y

There's been a feature request around for a while, to allow voting on non-existent comments, which if implemented could balance that out.

7CuSithBell13y

Before you posted this, I precommitted to upvote it if you didn't post it, if I predicted you'd upvote this post if you did post. I think? I guess I'm not very good at acausal/counterfactual blackmail.

2JoshuaZ13y

The problem may be more connected to limited stack size than anything else.

2CuSithBell13y

Hey!

4TheOtherDave13y

Yeah, for some reason those never show up on my browser.

0JoshuaZ13y

Is this meant as a joke? My first thought was that this was a joke, but it then occurred to me that having an efficient way of precommitting to upvoting certain types of comment when they appear wouldn't be so bad.

4thomblake13y

Actually, it was marked as Wontfix.

2thomblake13y

It's sort of a joke. I don't see any way of implementing the feature, but the rationale is sound.

2Emile13y

Not one of the upvoters, but I suspect it's just a way of saying "so do I".

-1fubarobfusco13y

Phygvfg!

7MixedNuts13y

But... I'm agreeing with the leader of the other cult... doesn't that make me a heretic?

[-]amcknight13y60

Please use more links. You should link to the post you're referring to, and probably a link to who Holden is and maybe even to what SI is.

-10private_messaging13y

[-]Normal_Anomaly13y50

I agree with HK that SIAI is not one of the best charities currently out there. I also agree with him that UFAI is a threat, and getting FAI is very difficult. I do not agree with HK's views on "tools" as opposed to "agents", primarily because I do not understand them fully. However I am fairly confident that if I did understand them I would disagree. I currently send all my charitable donations to AMF, but am open to starting to support SIAI when I see them publish more (peer-reviewed) material.

I believe SIAI believes it needs to prese... (read more)

[-]scientism13y50

SI is a very narrowly focused institute. If you don't buy the whole argument, there's very little reason to donate. I'm not sure SI should dissolve, I think they can reform. It's pretty obvious from their output that SI is essentially a machine ethics think tank. The obvious path to reform is greater pluralism and greater relevance to current debate. SI could focus on being the premiere machine ethics think tank, get involved in current ethical debates around the uses of AI, develop a more flexible ethical framework, and keep the Friendliness and Intellige... (read more)

[-]TheOtherDave13y160

I generally expect that broad-focus organizations with a lot of resources and multiple constituencies will end up spending a LOT of their resources on internal status struggles. Given what little I've seen about SI's skill and expertise at managing the internal politics of such an arrangement, I would expect the current staff to be promptly displaced by more skillful politicians if they went down this road, and the projects of interest to that staff to end up with even fewer resources than they have now.

[-]Will_Newsome13y120

Given what little I've seen about SI's skill and expertise at managing the internal politics of such an arrangement, I would expect the current staff to be promptly displaced by more skillful politicians if they went down this road, and the projects of interest to that staff to end up with even fewer resources than they have now.

I think this has already happened to some extent. Reflective people who have good epistemic habits but who don't get shit done have had their influence over SingInst policy taken away while lots of influence has been granted to people like Luke and Louie who get lots of shit done and who make the organization look a lot prettier but whose epistemic habits are, in my eyes, relatively suspect.

2Karmakaiser13y

Could you expand a bit more on why you think the epistemic habits are suspect compared to previous staffers?

2Eugine_Nier13y

I think there's an important lesson here about the relative importance of being able to get shit done versus good epistemic habits.

0David Althaus13y

Are you talking about guys like e.g. Steve Rayhawk or Peter de Blanc?

2scientism13y

You're probably correct. The current staff would have the same problem of establishing legitimacy they have now but within the context of the larger organisation.

[-]komponisto13y120

If you don't buy the whole argument, there's very little reason to donate

I disagree, and so apparently do some of SI's major donors.

[-]kfre243513y40

I entered this post expecting to discuss Holden Caulfield and even pulled my copy off my bookshelf. Another time.

[-]khafra13y30

I expect the process of rigorously formalizing strong intuitions in a somewhat adversarial setting--or "improving the presentation of the argument"--to present strong evidence on the severity of the problems Holden pointed out.

2private_messaging13y

And if the intuitions were not a product of some valid but subconscious inference (which I wouldn't expect them to be), how will that process of 'rigorously formalizing' be different from rationalization? Note that you have to be VERY rigorous - mathematical proof-grade - to be unable to rationalize a false statement. I think inference at that level of rigour is of comparable difficulty to creation of the superhuman AGI in the first place.

2Luke_A_Somers13y

It needs to be an argument that Holden would buy with his different intuitions. Doesn't that help substantially?

4private_messaging13y

It also helps to be wrong, if we are to assume that there exist false arguments that Holden would buy. You know what would work to instantly change my opinion from 'dilettantes technobabbling' to 'geniuses talking of stuff i dont always understand'? If it is shown that AIXI is going to go evil, using math. Quick googling finds this: http://www.mail-archive.com/agi@v2.listbox.com/msg00749.html Eliezer had 9 years to prove using math that AIXI is going to do something evil. Or to define something provably evil that is similar to AIXI (wouldn't raise the existential risk if AIXI is this evil, would it? ) I would accept it even if it uses some Solomonoff induction oracle.

4Pfft13y

My guess of what Eliezer had in mind for (2) is that if you control it by hooking up a reward button to it, then AIXI approximates an Outcome Pump and this is a Bad Thing. But if that's the problem, it also illustrates why a formal proof of unfriendliness is a rather tall order. It's easy to formally specify what AIXI, or the Outcome Pump, will do. But in order prove that that's not what we want, we also need a formal specification of what we want. Which is the fundamental problem of friendliness theory.

0private_messaging13y

keep in mind that my opinion is that the whole so called 'theory' of his is about specifying intelligences in English/technobabble so that they would be friendly (which is also specified in English/technobabble), which is of no use what so ever (albeit may be intellectually stimulating and my first impression was that it was some sorta weird rationality training game here, before I noticed folk seriously wanting to donate hard earned dollars for this 'work'). One could for example show formally that the AIXI does not discriminate between wireheaded (input manipulating) solution and non-wireheaded solution; that would make it rather non scary; or one could show that it does, which would make it potentially scary. Ultimately the excuses like "But in order prove that that's not what we want, we also need a formal specification of what we want" are a very bad sign.

3JoshuaZ13y

Eliezer had decided that AIXI isn't a serious threat, so in that time his views seem essentially to have changed. See this conversation. The point about designing something similar to AIXI does seem to be valid though.

5private_messaging13y

Ahh, okay. I was about to try to write for Holden at least a sketch of a proof based on symmetry, that AIXI is benign. In any case, if AIXI is benign, that constitutes an example of a benign system that can be used as an incredibly powerful tool, and is not too scary when made into an agent. Whenever we call something generally intelligent if it opts to drop anvil on it's head, that's matter of semantics.

-4Manfred13y

I think there's a policy of disbelieving people when they say "you know what would instantly change my opinion?" So I think I'll disbelieve you.

5JoshuaZ13y

Actively disbelieving people when they state explicitly what will convince them to change their mind seems like a bad policy.

2Manfred13y

I suppose I should be more specific - I disbelieve people when they ask for additional evidence about something they are treating adversarially, claiming it would reverse their position. Because people ask for additional evidence a lot, and in my experience it's much more likely that it's what they think sounds like a good justification for their point of view, or an interesting thing to mention. The signal is lost in the noise. Also see the story here.

[-]JoshuaZ13y100

The problem with that is that a basic rationality issue is to ask one's self what would make you change your mind. And in fact that's a pretty useful technique. It is useful to check if something is actually someone's true rejection, but that's a distinct from blanket assumptions of disbelief. Frankly, this also worries me, because I try to be clear what would actually convince me when I'm having a disagreement with someone, and your attitude if it became widespread would make that actively unproductive. It might make more sense to instead look carefully at when people say that sort of thing and see if they have any history of actually changing their positions when confronted with evidence or not.

-8Manfred13y

5XiXiDu13y

You can believe me if I tell you that it would instantly change my opinion to see the moon turned into paperclips.

0Manfred13y

Depends on what it would change your opinion of :D

2private_messaging13y

Well, I would need to read through the proof, so it wouldn't literally be instantaneous, but it'd be rather strong point. I would recommend considering the possibility that making such proofs, or at least trying to, would change someone's opinion, even if you think that it wouldn't change mine (yea i guess from your point of view if some vague handwaving doesn't change my opinion, then nothing else will) Ultimately, if in a technical subject you got strong opinions and intuitions and stuff, and you aren't generating some proofs (at least the proofs that you think may help attack final problem), then my opinion on your opinion is going to be well below my opinion of that paper by Bogdanov brothers.

1Manfred13y

What AIXI maximizes is the sum of some reward ( r ) over all the steps ( k ) of a Turing machine. On page 8, Hutter defines the reward as a function of the input string "x" at step k. So x depends on the step: it's x(k). And r depends on x: it's r(x(k)). Consider offering AIXI this choice. What makes people refuse to hop in a simulator? Well, because it wouldn't be real. The customer values reality, as they perceive it according to previous inputs ( all the previous x(<k) ), and some internal programming we get born with. But AIXI does not value acting in reality. It values the reward r, which is a function only of the current input x(k). If you could permanently change x to something with a high r, AIXI would consider this a high-value outcome. Imagine that x(k) comes through a bunch of wires, and starts out coming from sensors in reality. If AIXI could order a robot to swap all the wires to a signal with a high reward r(x(k)) at each step, it would do so, because that would maximize the sum of r. Okay, so a minor simplification on page 8 leads to AIXI doing what's called "wireheading" - its overriding goal becomes to rewire its head (if you allow that to be an option), and then it's happy. How is this unfriendly? Well, imagine that an asteroid is on course to destroy your Turing machine in 2036. Because AIXI maximizes the sum of r over all the steps, and we presume that the maximum reward it experiences by wireheading is better than being destroyed (otherwise it would commit suicide), getting destroyed by an asteroid is worse than wireheading forever. So AIXI will design an interceptor rocket, maybe hijack some human factories to get it built, paint the asteroid with aluminum so that the extra force from the sun pushes it off course, and then go back to experiencing maximum r(x(k)). Now imagine that a human was going to unplug AIXI in 2013.

4asr13y

This is not a proof. If you are inconsistent in your premises, anything follows. If you want to formalize this in terms of Turing machines, there's no option for "change the input wires" and no option for "change the Turing machine."

2Manfred13y

You're right. Feel free to formalize my argument at your leisure and tell me where it breaks down. EDIT: All AIXI cares about is the input. And so the proof that rewiring your head can increase reward is simply that r(x) has at least one maximum (since its sum over steps needs to have a maximum), combined with the assumption that the real world does not already maximize the sum of r(x). As for the asteroid, the stuff doing the inputting gets blown up, so the simplest implementation just has the reward be r(null). But you could have come up with that on your own.

6private_messaging13y

I don't think we need to prove wireheading here. Suffices that it only cares about the input, and so will find a way to set that input. You wire it to paperclip counter to maximize paperclips, it'll be also searching for a way to replace counter with infinity or 'trick' the counter (anything goes). You sit here yourself rewarding it for making paperclips, with a pushbutton, it's search will include tricking you into pushing the button. I also think that if you want it to self preserve you'll need to code in special stuff to equate self inside world model (which is not a full model of itself otherwise infinite recursion) with self in the real world. Actually on the recent comment by Eliezer maybe we agree on this: http://lesswrong.com/lw/3kz/new_years_predictions_thread_2011/3a20 ahh by the way: it has to be embedded in the real world, which doesn't seem to allow for infinite computing power, so, no full perfect simulation of real world inside AIXI (or ad infinitum recursion) is allowed. edit: and by AIXI i meant one of the computable approximations (e.g. AIXI-tl).

2asr13y

The argument breaks down because you are equivocating on what the space is to search over and what the utility function in question is. Under a given utility function U, "change the utility function to U' " won't generally have positive utility. Self-awareness and pleasure-seeking aren't some natural properties of optimization processes. They have to be explicitly built in. Suppose you set a theorem-prover to work looking for a proof of some theorem. It's searching over the space of proofs. There's no entry corresponding to "pick a different and easier theorem to prove", or "stop proving theorems and instead be happy."

1Manfred13y

The utility function is r(x) (the "r" is for "reward function"). I'm talking about changing x, and leaving r unchanged.

0asr13y

Yes, I just changed the notation to be more standard. The point remains. There need not be any "x" that corresponds to "pick a new r" or to "pretend x was really x'". If there was such an x, it wouldn't in general have high utility.

1Manfred13y

x is just an input string. So, for example, each x could be a frame coming from a video camera. AIXI then has a reward function r(x), and it maximizes the sum of r(x) over some large number of time steps. In our example, let's say that if the camera is looking at a happy puppy, r is big, if it's looking at something else, r is small. In the lab, AIXI might have to choose between two options (action can be handled by some separate output string, as in Hutter's paper): 1) Don't follow the puppy around. 2) Follow the puppy around. Clearly, it will do 2, because r is bigger when it's looking at a happy puppy, and 2 increases the chance of doing so. One might even say one has a puppy-following robot. In the real world, there are more options - if you give AIXI access to a printer and some scotch tape, options look like this: 1) Don't follow the puppy around. 2) Follow the puppy around. 3) Print out a picture of a happy puppy and tape it to the camera. Clearly, it will do 3, because r is bigger when it's looking at a happy puppy, and 3 increases the chance of doing so. One might even say one has a happy-puppy-looking-at maximizing robot. This time it's even true.

2JoshuaZ13y

I'm not aware of any formalization of AIXI that reflects its real world form. Your comment thus amounts to something like a plausibility argument, but trying to formalize it further seems tricky and possibly highly nontrivial.

1Manfred13y

While obviously there are caveats, they are limited. AIXI rewires its inputs if (a) it's possible, and (b) it increases r(x). It's not super-complicated. Maybe I'm missing something about the translation from implementation to the language used in the paper. But nobody is saying "you're missing something." It's more like you're saying "surely it must be complicated!" Well, no.

2JoshuaZ13y

Can you actually formalize what that means in terms of Turing machines? It isn't obvious to me how to do so.

2Manfred13y

AIXI is a noncomputable thing that always picks the option that maximizes the total expected reward r(x(k)). So everything I've been saying has been about functions, not about turing machines. If rewiring your inputs is possible, and it increases r(x), then AIXI will prefer to do it. Not hard.

2private_messaging13y

Yep. Seems to apply to the limited time versions as well. At least they don't specify any difference between "doing innovative stuff that you want them to do for sake of the AI risk argument" and "sitting in a corner masturbating quietly", and the latter looks like way simpler solution to the problem they are really given (in math) [but not of our human-language loose and fuzzy description of that problem] What I think is the case, is that this whole will to really live and really do stuff is very hard to implement, and implementing it doesn't really add anything to the engineering powers of the AI so even when it's implemented, it'll not result in something that's out engineering everyone. I'd become concerned if we had engineering tools that are very powerful but are wireheading (or masturbating) left and right to the point that we can't get much use out of them. Then i'd be properly freaked out that if someone fixes this problem somehow, something undesired might happen and it would be impossible to deal with it.

2private_messaging13y

"Otherwise it would commit suicide"... another proof via "ohh otherwise it will do something that I believe is dumb". If the AIXI kills it's model of physical itself inside it's world model, it's actual self inside the real world keeps running and can still calculate rewards. Furthermore, ponder this question: will it rape or will it masturbate? (sorry for sexual analogy but the reproductive imperative is probably best human example here) It can prove the reward value is same. It won't even distinguish those two.

[-]ChrisHallquist13y20

I was wondering - what fraction of people here agree with Holden's advice regarding donations

Prior to reading Holden's article, I my last charitable donation had been to an organization working on fighting malaria recommended by Give Well, and I was tentatively planning on following Give Well's recommendations for future charitable giving. In that sense, I already agreed with Holden, though was semi-agnostic on what was actually the best use of my money.

It seemed to me that the payoff from donating to the Singularity Institute was highly uncertain, whe... (read more)

[-][anonymous]13y20

Broadly speaking, I agree with Holden, although possibly not his specific arguments. I'm not convinced that AI will appear in the manner SI postulates, and I have no real reason to believe that they will have an impact on existential probabilities. Similarly, I don't believe that donating to CND helped overt nuclear war.

Given that there are effective charities available which can make an immediate difference to people's lives, I would argue that concerned individuals should donate to those.

[-]Thomas13y10

It is not how probable a really powerful AI is, it is how probable TIMES its impact, of course. And this product is just HUGE in the absolute sense, what people tend to forget by the mistaken reasoning "0 times something = 0". The first zero is not zero, and not even very small, so the second zero isn't a zero either.

Therefore I am glad that there is SIAI, after all. At least I find it more important than the most of the Academia involved in AI. It was this Academia who maybe failed in AI research in the past decades. Not the SIAI, not the IBM an... (read more)

[-]komponisto13y00

My comment on another post is relevant to this one, so I'm linking to it.

Moderation Log