LESSWRONG
LW

All of redlizard's Comments + Replies

Limitations on Formal Verification for AI Safety

One overriding perspective in Max and my approach is that we need to design our systems so that their safety can be formally verified. In software, people often bring up the halting problem as an argument that general software can't be verified. But we don't need to verify general software, we are designing our systems so that they can be verified.

I am a great proponent of proof-carrying code that is designed and annotated for ease of verification as a direction of development. But even from that starry-eyed perspective, the proposals that Andrew argues... (read more)

Physics is Ultimately Subjective

redlizard2y*117

This seems very confused.

What makes good art a subjective quality is that its acceptance criterion is one that refers to the viewer as one of its terms. The is-good-art() predicate, or the art-quality() real-valued function, has a viewer parameter in it. What makes good physics-theory an objective quality is that its acceptance criterion doesn't refer to the viewer; the is-good-physics-theory() predicate, or the physics-theory-accuracy() real-valued function, is one that compares the theory to reality, without the viewer playing a role as a term inside the... (read more)

0Gordon Seidoh Worley2y

But how would you know if anything is actually objective? Have you experienced objectivity not through a subjective lens? This is my point: however strongly we might believe something to be a particular way, it is still something we believe. That doesn't diminish the quality of the belief or its potential truthfulness, but it is important not to miss this. Again, how do you know that this basic distinction exists? Can you point to it without relying on subjective evidence (evidence that passed through some observer)? If you cannot, then anything we might claim to be "objective" still rests upon subjective assessment of evidence. Objectivity is a constructed idea created by subjective agents.

Exams-Only Universities

redlizard2y114

This was my experience studying in the Netherlands as well. University officials were indeed on board with this, with the general assumption being that lectures and instructions and labs and such are a learning resource that you can use or not use at your discretion.

1DavidHolmes2y

Maths at my Dutch university also has homework for quite a few of the courses, which often counts for something like 10-20% of final grade. It can usually be submitted online, so you only need to be physically present for exams. However, there are a small number of courses that are exceptions to this, and actually require attendance to some extent (e.g. a course on how to give a scientific presentation, where a large part of the course consists of students giving and commenting on each other's presentations - not so easy to replace the learning experience with a single exam at the end). But this differs between Dutch universities.

Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment

redlizard3y30

I would say that some formal proofs are actually impossible

Plausible. In the aftermath of spectre and meltdown I spent a fair amount of time thinking on how you could formally prove a piece of software to be free of information-leaking side channels, even assuming that the same thing holds for all dependent components such as underlying processors and operating systems and the like, and got mostly nowhere.

In fact, I think survival timelines might even require anyone who might be working on classes of software reliability that don't relate to alignment

redlizard3y90

Yes, I agree with this.

I cannot judge to what degree I agree with your strategic assessment of this technique, though. I interpreted your top-level post as judging that assurances based on formal proofs are realistically out of reach as a practical approach; whereas my own assessment is that making proven-correct [and therefore proven-secure] software a practical reality is a considerably less impossible problem than many other aspects of AI alignment, and indeed one I anticipate to actually happen in a timeline in which aligned AI materializes.

5elspood3y

I would say that some formal proofs are actually impossible, but would agree that software with many (or even all) of the security properties we want could actually have formal-proof guarantees. I could even see a path to many of these proofs today. While the intent of my post was to draw parallel lessons from software security, I actually think alignment is an oblique or orthogonal problem in many ways. I could imagine timelines in which alignment gets 'solved' before software security. In fact, I think survival timelines might even require anyone who might be working on classes of software reliability that don't relate to alignment to actually switch their focus to alignment at this point. Software security is important, but I don't think it's on the critical path to survival unless somehow it is a key defense against takeoff. Certainly many imagined takeoff scenarios are made easier if an AI can exploit available computing, but I think the ability to exploit physics would grant more than enough escape potential.

Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment

redlizard3y380

Assurance Requires Formal Proofs, Which Are Provably Impossible

The Halting Problem puts a certain standard of formalism outside our reach

This is really not true. The halting problem only makes it impossible to write a program that can analyze a piece of code and then reliably say "this is secure" or "this is insecure". It is completely possible to write an analyzer that can say "this is secure" for some inputs, "this is definitely insecure for reason X" for some other inputs, and "I am uncertain about your input so please go improve it" for everythin... (read more)

7elspood3y

It would be nice to able to have this important impossible thing. :) I think we are trying to say the same thing, though. Do you agree with this more concise assertion? "It's not possible to make a high confidence checker system that can analyze an arbitrary specification, but it is probably possible (although very hard) to design systems that can be programmatically checked for the important qualities of alignment that we want, if such qualities can also be formally defined."

2020 Election: Prediction Markets versus Polling/Modeling Assessment and Postmortem

redlizard4y10

We see this a lot on major events, as I’ve noted before, like the Super Bowl or the World Cup. If you are on the ball, you’ll bet somewhat more on a big event, but you won’t bet that much more on it than on other things that have similarly crazy prices. So the amount of smart money does not scale up that much. Whereas the dumb money, especially the partisans and gamblers, come out of the woodwork and massively scale up.

This sounds like it should generalize to "in big events, especially those on which there are vast numbers of partisans on both sides, th... (read more)

3Zvi4y

That goes too far. What happens is that the line is less accurate than normal, and there will almost always be good value if you look around the various offerings. But there are a lot of forces that will come in hard if the number gets super wrong, and a lot of ways for even relatively dumb money to know more or less what the odds should be. So if a WC match is actually 65-35, it might be 60-40 or 70-30 instead, which is a great opportunity, but it's not going to be useless. It depends what you already know - you should have a fair value in mind, expect a second value that's different, then look at what you find. And if it's not what you expected, maybe you modeled the public wrong, but also maybe you're missing something. Basically, if you want to know who the favorite is, the line is trustworthy unless it's super close (52-48 or something). Exactly how big a favorite, or especially various secondary lines, and you get less trustworthy. In an election, you don't have those kind of anchors, so it's easier to get far out of whack.

An Orthodox Case Against Utility Functions

redlizard5y30

I do not think you are selling a strawman, but the notion that a utility function should be computable seems to me to be completely absurd. It seems like a confusion born from not understanding what computability means in practice.

Say I have a computer that will simulate an arbitrary Turing machine T, and will award me one utilon when that machine halts, and do nothing for me until that happens. With some clever cryptocurrency scheme, this is a scenario I could actually build today. My utility function ought plausibly to have a term in it that assigns a po

... (read more)

1TAG5y

That seems to conflate two different things: whether you can compute the occurrence of event E, as opposed to whether you could compute your preference for E over not E.

4AlexMennen5y

No, you can't do that today. You could produce a contraption that will deposit 1 BTC into a certain bitcoin wallet if and when some computer program halts, but this won't do the wallet's owner much good if they die before the program halts. If you reflect on what it means to award someone a utilon, rather than a bitcoin, I maintain that it isn't obvious that this is even possible in theory. There is a notion of computability in the continuous setting. This seems like a strawman to me. A better motivation would be that agents that actually exist are computable, and a utility function is determined by judgements rendered by the agent, which is incapable of thinking uncomputable thoughts.

Book Review: Design Principles of Biological Circuits

redlizard5y50

This second claim sounds to me as being a bit trivial. Perhaps it is my reverse engineering background, but I have always taken it for granted that approximately any mechanism is understandable by a clever human given enough effort.

This book [and your review] explains a number of particular pieces of understanding of biological systems in detail, which is super interesting; but the mere point that these things can be understood with sufficient study almost feels axiomatic. Ignorance is in the map, not the territory; there are no confusing phenomena, only m... (read more)

johnswentworth5y130

The second claim was actually my main goal with this post. It is a claim I have heard honest arguments against, and even argued against myself, back in the day. A simple but not-particularly-useful version of the argument would be something like "the shortest program which describes biological behavior may be very long", i.e. high Kolmogorov complexity. If that program were too long to fit in a human brain, then it would be impossible for humans to "understand" the system, in some sense. We could fit the program in a long book, maybe, b... (read more)

What Programming Language Characteristics Would Allow Provably Safe AI?

Answer by redlizardSep 04, 201980

If you are going to include formal proofs with your AI showing that the code does what it's supposed to, in the style of Coq and friends, then the characteristics of traditionally unsafe languages are not a deal-breaker. You can totally write provably correct and safe code in C, and you don't need to restrict yourself to a sharply limited version of the language either. You just need to prove that you are doing something sensible each time you perform a potentially unsafe action, such as accessing memory through a pointer.

This slows things down a... (read more)

3Davidmanheim6y

This is really helpful - thanks!

A Personal Rationality Wishlist

redlizard6y30

The point is: if people understood how their bicycle worked, they’d be able to draw one even without having to literally have one in front of them as they drew it!

I don't think this is actually true. Turning a conceptual understanding into an accurate drawing is a nontrivial skill. It requires substantial spatial visualization ability, as well as quite a bit of drawing skill -- one who is not very skilled in drawing, like myself, might poorly draw one part of a bike, want to add two components to it, and then realize that there is no way to add a thir... (read more)

Change A View: An interesting online community

redlizard6y10

Even if such a person decides to do this, they will eventually get fed up and leave.

Will they, necessarily? The structure of the problem you describe sounds a lot like any sort of teaching, which involves a lot of finding out what a student misunderstands about a particular topic and then fixing that, even if you clear up that same misunderstanding for a different student every week. There are lots of people who do not get fed up with that. What makes this so different?

3romeostevensit6y

unpaid internet arguing, without the reward of seeing a change positively impact someone's life. The selection effect means you wind up interacting mostly with those who want to argue rather than collaborate.

Pecking Order and Flight Leadership

redlizard6y40

Pigeons have stable, transitive hierarchies of flight leadership, and they have stable pecking order hierarchies, and these hierarchies do not correlate.

one of the things you can do with the power to give instructions is to instruct others to give you more goodies.

It occurs to me that leading a flight is an unusual instruction-giving power, in that it comes with almost zero opportunities to divert resources in your own direction. Choosing where to fly and when to land affects food options, but it does not affect your food options relative to your flight-ma... (read more)

Thoughts on Ben Garfinkel's "How sure are we about this AI stuff?"

redlizard6y40

General rationality question that should not be taken to reflect any particular opinion of mine on the topic at hand:

At what point should "we can't find any knowledgeable critics offering meaningful criticism against <position>" be interpreted as substantial evidence in favor of <position>, and prompt one to update accordingly?

4Viliam6y

I feel it's like "A -> likely B" being an evidence for "B -> likely A"; generally true, but it could be either very strong or very weak evidence depending on the base rates of A and B. Not having knowledgeable criticism against position "2 + 2 = 4" is strong evidence, because many people are familiar with the statement, many use it in their life or work, so if it is wrong, it would be likely someone would already offer some solid criticism. But for statements that are less known or less cared about, it becomes more likely that there are good arguments against them, but no one noticed them yet, or no one bothered to write a solid paper about them.

Good arguments against "cultural appropriation"

redlizard6y30

Having lost this signaling tool, we are that much poorer.

Are we? Signaling value is both a blessing and a curse, and my impression is that it is generally zero-sum. Personally, I consider myself *richer* when a mundane activity or lifestyle choice loses its signaling association, for it means I am now less restricted in applying it.

2Tyrrell_McAllister6y

You may be interpreting "signalling" in a more specific way than I intended. You might be thinking of the kind of signalling that is largely restricted to status jockeying in zero-sum status games. But I was using "signaling tool" in a very general sense. I just mean that you can use the signaling tool to convey information, and that you and your intended recipients have common knowledge about what your signal means. In that way, it's basically just a piece of language. As with any piece of language, the fact that it signals something does place restrictions on what you can do. For example, you can't yell "FIRE!" unless you are prepared to deal with certain consequences. But if the utterance "FIRE!" had no meaning, you would be freer, in a sense, to say it. If the mood struck you, you could burst out with a loud shout of "FIRE!" without causing a big commotion and making a bunch of people really angry at you. But you would also lack a convenient tool that reliably brings help when you need it. This is a case where I think that the value of the signal heavily outweighs the restrictions that the signal's existence places on your actions.

3Wei Dai6y

I think you bring up a good point, but rather than being zero-sum, signaling can be either socially beneficial or detrimental compared to no signaling, depending on the details of the situation, so in theory removing a signaling tool can make us either richer or poorer. I'm not sure if economists have a consensus on whether signaling is typically good or bad, and would be curious if anyone knows this.

Fixed Point Exercises

redlizard6y110

At the time of writing, for the two spoilers in the main post, hovering over either will reveal both. Is that intentional? It does not seem desirable.

8habryka6y

Nope, not intentional. Will see whether I can get around to fixing that today.

The funnel of human experience

redlizard6y41

I think there is about a three orders of magnitude difference between the difficulties of "inventing calculus where there was none before" and "learning calculus from a textbook explanation carefully laid out in the optimal order, with each component polished over the centuries to the easiest possible explanation, with all the barriers to understanding carefully paved over to construct the smoothest explanatory trajectory possible".

(Yes, "three orders of magnitude" is an actual attempt to estimate something, insofar as that is at all meaningful for an unquantified gut instinct; it's not just something I said for rhetoric effect.)

"Now here's why I'm punching you..."

redlizard6y70

I think it will be next to impossible to set up a community norm around this issue for all communities save those with a superhuman level of general honesty. For if there is a norm like this in place, Alice always has a strong incentive to pretend that she is punching based on some generally accepted theory, and that the only thing that needs arguing is the application of this theory to Bob (point 2). Even when there is in fact a new piece of theory ready to be factored out of Alice's argument, it is in Alice's interest to pass this off as being ... (read more)

2philh6y

To be clear, I think this is a good (prosocial) way for individuals to act. I'm not trying to advocate that we should make it a community norm. But I'm unconvinced by this particular failure mode. Surely this incentive exists anyway for Alice? There's no existing norm against what I propose. I don't see why this would be. At least not any general principle that her readers will be familiar with and agree with, which is what would be required. I'm not suggesting that after Alice publishes part (2), people who don't think "punching Bob is better than the alternatives" should punch Bob. Alice doesn't just need to convince people that there is an argument for punching Bob, she needs to convince people to punch Bob.

Fundamentals of Formalisation level 1: Basic Logic

redlizard7y230

Why oh way does this system make it so needlessly inconvenient to partake in these courses?

I just want to read the lecture material on the topics that interest me, and possibly do some of the exercises. Why do I need to create an account for that using a phony email, subscribe to courses, take tests with limited numbers of retries, and all of that? I am not aiming to get a formal diploma here, and I don't think you plan on awarding me any. So why can't I just... browse the lectures in a stateless web 1.0 fashion?

3[anonymous]7y

Thank you for your criticism. We need more of that. A pipeline has 2 purposes: training people and identifying good students. We want to do the latter as much as the former. Not just for the sake of the institutions we ultimately wish to recommend candidates to, but also for the sake of the candidates that want to know whether they are up to the task. We recently did a poll on Facebook asking "what seems to be your biggest bottleneck to becoming a researcher" and "I'm not sure I'm talented enough" was the most popular option by far (doubling the next one). I agree that it looks silly right now because we're a tiny startup that uploaded 2 videos and a few guides to some textbooks, and it will probably be this small for at least a year to come. You got me to consider using something more humble in the meantime. I'll bring it up in our next meeting.

gjm7y110

It looks to me as if ihatestatistics.com is a for-profit business selling educational materials and systems to universities. The design decisions that make most sense for them are not necessarily the most convenient for students. So e.g. they may be keen to be able to distinguish one student from another, take measures against cheating, encourage and/or measure "engagement", etc. The things they do to that end may be annoying for students, but they still do them because they make their actually-paying customers happier, or make it easier to convi... (read more)

In Defense of Ambiguous Problems

redlizard7y50

This. The phrasing "if you are on Vulcan, then you are on the mountain" *sounds* like it should be orthogonal to, and therefore gives no new information on and cannot affect the probability of, your being on Vulcan.

This is quite false, as can be shown easily by the statement "if you are on Vulcan, then false". But it is a line of reasoning I can see being tempting.

Three types of "should"

redlizard7y80

To me, this form of "epistemic should" doesn't feel like a responsibility-dodge at all. To me, it carries a very specific meaning of a particular warning: "my abstract understanding predicts that X will happen, but there are a thousand and one possible gotchas that could render that abstract understanding inapplicable, and I have no specific concrete experience with this particular case, so I attach low confidence to this prediction; caveat emptor". It is not a shoving off of responsibility, so much as a marker of low confidence, a... (read more)

The simple picture on AI safety

redlizard7y80

I agree that a distillation of a complex problem statement to a simple technical problem represents real understanding and progress, and is valuable thereby. But I don't think your summary of the first half of the AI safety problem is one of these.

The central difficulty that stops this from being a "mere" engineering problem is that we don't know what "safe" is to mean in practice; that is, we don't understand in detail *what properties we would desire a solution to satisfy*. From an engineering perspective, that marks th... (read more)

2Alex Flint7y

I parse you as pointing to the clarification of a vague problem like "flight" or "safety" or "heat" into an incrementally more precise concept or problem statement. I agree this type of clarification is ultra important and represents real progress in solving a problem, and I agree that my post absolutely did not do this. But I was actually shooting for something quite different. I was shooting for a problem statement that (1) causes people to work on the problem, and (2) causes them to work on the right part of the problem. I claim it is possible to formulate such a problem statement without doing any clarification in the sense that you pointed at, and additionally that it is useful to do so because (1) distilled problem statements can cause additional progress to be made on a problem, and (2) clarification is super hard, so we definitely shouldn't block causing additional work to happen until clarification happens, since addition work could be a key ingredient in getting to key clarifications. To many newcomers to the AI safety space, the problem feels vast and amorphous, and it seems to take a long time before newcomers have confidence that they know what exactly other people in the space are actually trying to accomplish. During this phase, I've noticed that people are mostly not willing to work directly on the problem, because of the suspicion that they have completely misunderstood where the core of the problem actually is. This is why distillation is valuable even absent clarification.

Slack

redlizard7y161

I think what we are looking at here is Moloch eating all slack out of the system. I think that is a summary of about 75% of what Moloch does.

More Dakka

redlizard7y30

In these cases, I do not think such explanations are enough.

Eliezer gives the model of researchers looking for citations plus grant givers looking for prestige, as the explanation for why his SAD treatment wasn’t tested. I don’t buy it. Story doesn’t make sense.

On my model, the lack of exploitability is what allowed the failure to happen, whereas your theory on reasons why people do not try more dakka may be what caused the failure to happen.

If the problem were exploitable in the Eliezer-sense, the market would bulldoze straight through the roadblocks y... (read more)

4Zvi7y

Our models differ in the magnitude of prestige effects, but not sure they disagree that much. So I think that yes, you'd get a lot less prestige for working out the details, but that it's still a very 'good deal' in prestige terms given the size of the opportunity. I also think that there's a difference between making little tweaks that improve matters versus making a large tweak that makes things much better; the first one has a much bigger low-prestige problem. Basically I think that yes, you take an order-of-magnitude hit to prestige here, but it's more than made up for by the ease of finding and exploting the problem. In terms of the market bulldozing through such things, I have much less faith that markets are so reliably good at such things. I think they're very good, but unreliable without imposing several additional constraints that often don't hold or only partially hold. Yes, being exploitable in that sense much improves the chance someone will exploit slash fix the issue, but the search process for things to exploit, and the decision process to do so, and thre requirements to do so, and the opportunity cost of doing so, and so forth, make it quite easy for exploitable things to sit there unexploited, or for things that are exploitable once you notice with the right other personal circumstances to go motly unnoticed and therefore unexploited, and for many things that are exploitable somewhat to not get exploited, while other things are what is called 'overdone trades' where too many people try to exploit something that is not expolitable enough. Much of the time, the real cost that makes something unexploitable is the step of noticing the opportunity and taking the time to analyze it, which isn't an obviously exploitable opportunity for exploration, whereas the actual exploitation process then becomes clearly good. In fact, if exploration of a problem is a marginal decision, you should expect to therefore find exploitable actions from it, just not enough

Intrinsic properties and Eliezer's metaethics

redlizard8y00

My interpretation of this thesis immediately remind me of Eliezer's post on locality and compactness of specifications, among others.

Under this framework, my analysis is that triangle-ness has a specification that is both compact and local; whereas L-opening-ness has a specification that is compact and nonlocal ("opens L"), and a specification that is local but noncompact (a full specification of what shapes it is and is not allowed to have), but no specification that is both local and compact. In other words, there is a short specification which... (read more)

Rationality Quotes Thread December 2015

redlizard9y-10

Consensus tends to be dominated by those who will not shift their purported beliefs in the face of evidence and rational argument.

Jim

gjm9y130

This appears to be empirically incorrect, at least in some fields. A few examples:

Creationists are much less willing to adjust their beliefs on the basis of evidence and argument than scientifically-minded evolutionists, but evolution rather than special creation is the consensus position these days.
It looks to me (though I confess I haven't looked super-hard) as if the most stubborn-minded economists are the adherents of at-least-slightly-fringey theories like "Austrian" economics rather than the somewhere-between-Chicago-and-Keynes mainstream.
Consensus views in hard sciences like physics are typically formed by evidence and rational argument.

Stupid Questions May 2015

redlizard10y40

I have tried exactly this with basic topology, and it took me bloody ages to get anywhere despite considerable experience with coq. It was a fun and interesting exercise in both the foundations of the topic I was studying and coq, but it was by no means the most efficient way to learn the subject matter.

Principles of Disagreement

redlizard10y*30

My take on it:

You judge an odds ratio of 15:85 for the money having been yours versus it having been Nick's, which presumably decomposes into a maximum entropy prior (1:1) multiplied by whatever evidence you have for believing it's not yours (15:85). Similarly, Nick has a 80:20 odds ratio that decomposes into the same 1:1 prior plus 80:20 evidence.

In that case, the combined estimate would be the combination of both odds ratios applied to the shared prior, yielding a 1:1 * 15:85 * 80:20 = 12:17 ratio for the money being yours versus it being Nicks. Thus, you deserve 12/29 of it, and Nick deserves the remaining 17/29.

2cousin_it5y

Yeah, I made a pointlessly longer calculation and got the same answer. (And by varying the prior from 0.5 to other values, you can get any other answer.)

Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113

redlizard10y10

So it's nonstandard clever wordplay. Voldemort will still anticipate a nontrivial probability of Harry managing undetected clever wordplay. Which means it only has a real chance of working when threatening something that Voldemort can't test immediately.

0dxu10y

Correct. I address this in another comment.

Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113

redlizard10y30

I don't think this is likely, if only because of the unsatisfyingness. However:

And the messages would come out in riddles, and only someone who heard the prophecy in the seer's original voice would hear all the meaning that was in the riddle. There was no possible way that Millicent could just give out a prophecy any time she wanted, about school bullies, and then remember it, and if she had it would've come out as 'the skeleton is the key' and not 'Susan Bones has to be there'. (Ch.77)

Some foreshadowing on the idea of ominous-sounding prophecy terms a... (read more)

2014 Survey Results

redlizard10y230

MIRI Mission/MIRI Effectiveness .395 (1331)

This result sets off my halo effect alarm.

2014 Less Wrong Census/Survey

redlizard10y480

I took the survey. No scanner available, alas.

What math is essential to the art of rationality?

redlizard10y00

Seconded. P versus NP is the most important piece of the basic math of computer science, and a basic notion of algorithms is a bonus. The related broader theory which nonetheless still counts as basic math is algorithmic complexity and the notion of computability.

Harry Potter and the Methods of Rationality discussion thread, July 2014, chapter 102

redlizard11y160

I've always modeled it as a physiological "mana capacity" aspect akin to muscle mass -- something that grows both naturally as a developing body matures, and as a result of exercise.

Against utility functions

redlizard11y40

Certainly, though I should note that there is no original work in the following; I'm just rephrasing standard stuff. I particularly like Eliezer's explanation about it.

Assume that there is a set of things-that-could-happen, "outcomes", say "you win $10" and "you win $100". Assume that you have a preference over those outcomes; say, you prefer winning $100 over winning $10. What's more, assume that you have a preference over probability distributions over outcomes: say, you prefer a 90% chance of winning $100 and a 10% chance o... (read more)

3Lumifer11y

Right. So, keeping in mind that the issue is separating the pure mathematical structure from the messy world of humans, tell me what outcomes are, mathematically. What properties do they have? Where can we find them outside of the argument list to the utility function?

Against utility functions

redlizard11y30

It's full of hidden assumptions that are constantly violated in practice, e.g. that an agent can know probabilities to arbitrary precision, can know utilities to arbitrary precision, can compute utilities in time to make decisions, makes a single plan at the beginning of time about how they'll behave for eternity (or else you need to take into account factors like how the agent should behave in order to acquire more information in the future and that just isn't modeled by the setup of vNM at all), etc.

Those are not assumptions of the von Neumann-Morgens... (read more)

1Lumifer11y

Can you describe this "mathematical structure" in terms of mathematics? In particular, the argument(s) to this function, what do they look like mathematically?

Against utility functions

redlizard11y160

It's more than a metaphor; a utility function is the structure any consistent preference ordering that respects probability must have. It may or may not be a useful conceptual tool for practical human ethical reasoning, but "just a metaphor" is too strong a judgment.

jsteinhardt11y130

I don't think I have much to add to this discussion that you guys aren't already going to have covered, except to note that Qiaochu definitely understands what a utility function is and all of the standard arguments for why they "should" exist, so his beliefs are not a function of not having heard these arguments (just noting this because this thread and some of the siblings seem to be trying to explain basic concepts to Qiaochu that I'm confident he already knows, and I'm hoping that pointing this out will speed up the discussion).

Qiaochu_Yuan11y410

a utility function is the structure any consistent preference ordering that respects probability must have.

This is the sort of thing I mean when I say that people take utility functions too seriously. I think the von Neumann-Morgenstern theorem is much weaker than it initially appears. It's full of hidden assumptions that are constantly violated in practice, e.g. that an agent can know probabilities to arbitrary precision, can know utilities to arbitrary precision, can compute utilities in time to make decisions, makes a single plan at the beginning of ... (read more)

David_Gerard11y190

"a utility function is the structure any consistent preference ordering that respects probability must have."

Yes, but humans still don't have one. It's not even clear they can make themselves have one.

The Power of Noise

redlizard11y10

A more involved post about those Bad Confused Thoughts and the deep Bayesian issue underlying it would be really interesting, when and if you ever have time for it.

Some alternatives to “Friendly AI”

redlizard11y80

Upvoted for the simple reason that this is probably the first article I've EVER seen with a title of the form 'discussion about ' which is in fact about the quoted term, rather than the concept it refers to.

Come up with better Turing Tests

redlizard11y120

As a point of interest, I want to note that behaving like an illiterate immature moron is a common tactic for (usually banned) video game automation bots when faced with a moderator who is onto you, for exactly the same reason used here -- if you act like someone who just can't communicate effectively, it's really hard for others to reliably distinguish between you and a genuine foreign 13-year-old who barely speaks English.

Can noise have power?

redlizard11y50

"Worst case analysis" is a standard term of art in computer science, that shows up as early as second-semester programming, and Eliezer will be better understood if he uses the standard term in the standard way.

Actually, in the context of randomized algorithms, I've always seen the term "worst case running time" refer to Oscar's case 6, and "worst-case expected running time" -- often somewhat misleadingly simplified to "expected running time" -- refer to Oscar's case 2.

A computer scientist would not describe the

... (read more)

[LINK] Prisoner's Dilemma? Not So Much

redlizard11y40

Group selectionism alert. The "we are optimized for effectively playing the iterated prisoner's dilemma" argument, AKA "people will remember you being a jackass", sounds much more plausible.

3Shmi11y

I made no argument that cooperation emerges in the PD environment, quite the opposite. I argued that, once it emerged in a non-PD environment, it does not necessarily die out in a PD environment. No group selection required.

Rationality Quotes May 2014

redlizard11y280

Even with measurements in hand, old habits are hard to shake. It’s easy to fall in love with numbers that seem to agree with you. It’s just as easy to grope for reasons to write off numbers that violate your expectations. Those are both bad, common biases. Don’t just look for evidence to confirm your theory. Test for things your theory predicts should never happen. If the theory is correct, it should easily survive the evidential crossfire of positive and negative tests. If it’s not you’ll find out that much quicker. Being wrong efficiently is what scienc

... (read more)

The Fallacy of Gray

redlizard11y70

I already knew it, but this post made me understand it.

2013 Survey Results

redlizard11y120

Passphrase: eponymous haha_nice_try_CHEATER

Well played :)

9RRand11y

True, though they forgot to change the "You may make my anonymous survey data public (recommended)" to "You may make my ultimately highly unanonymous survey data public (not as highly recommended)".

Tell Culture

redlizard11y80

Trust -- the quintessential element of your so-called "tell culture" -- and vulnerability are two sides of the same coin.

That's true in general. In network security circles, a trusted party is one with the explicit ability to compromise you, and that's really the operational meaning of the term in any context.

How do you tell proto-science from pseudo-science?

redlizard11y00

My own definition - proto-science is something put forward by someone who knows the scientific orthodoxy in the field, suggesting that some idea might be true. Pseudo-science is something put forward by someone who doesn't know the scientific orthodoxy, asserting that something is true.

This seems like an excellent heuristic to me (and probably one of the key heuristics people actually use for making the distinction), not not valid as an actual definition. For example, Sir Roger Penrose's quantum consciousness is something I would classify as pseudoscience without a second thought, despite the fact that Penrose as a physicist should know and understand the orthodoxy of physics perfectly well.

2013 Less Wrong Census/Survey

redlizard11y40

Taking the survey IS posting something insightful.

2013 Less Wrong Census/Survey

redlizard11y330

Taken to completion.

The Cryonics Status question really needs an "other" answer. There are more possible statuses one can be in than the ones given; in particular there are more possible "I'd want to, but..." answers.

Harry Potter and the Methods of Rationality discussion thread, part 26, chapter 97

redlizard12y40

To figure out a strange plot, look at what happens, then ask who benefits. Except that Dumbledore didn't plan on you trying to save Granger at her trial, he tried to stop you from doing that. What would've happened if Granger had gone to Azkaban? House Malfoy and House Potter would've hated each other forever. Of all the suspects, the only one who wants that is Dumbledore. So it fits. It all fits. The one who really committed the murder is - Albus Dumbledore!

I think if you use this line of reasoning and then allow yourself to dismiss arbitrary parts of ... (read more)

0Benquo12y

Well, Harry trying to save Hermione could have been part of the plan, but it seems like both major candidates (Quirrel and Dumbledore) thought (or at least hoped) that Harry would not succeed.