All of Liam Donovan's Comments + Replies

Why are you calling this a nitpick? IMO it's a major problem with the post -- I was very unhappy that no mention was made of this obvious problem with the reasoning presented.

3noggin-scratcher
Because the central idea of the post isn't really about that specific probability puzzle, and can in theory stand alone to succeed or fail on other merits - regardless of whether that illustrative example in particular is actually a good choice of example. Possibly there are better examples in the full paper linked, but I couldn't comment on that either way because I've only read this excerpt/summary.

For what it's worth, I was just learning about the basics of MIRI's research when this came out, and reading it made me less convinced of the value of MIRI's research agenda. That's not necessarily a major problem, since the expected change in belief after encountering a given post should be 0, and I already had a lot of trust in MIRI. However, I found this post by Jessica Taylor vastly clearer and more persuasive (it was written before "Rocket Alignment", but I read "Rocket Alignment" first). In particular, I would ... (read more)

Maybe people who rationalized their failure to lose weight by "well, even Eliezer is overweight, it's just metabolic disprivilege"

6Raemon
I think it's more like "people who are currently struggling to lose weight, or get out of negative cycles of crippling anxiety about it, seeing it made light of is hurtful." (I think one can hold a legitimate position that that's a real problem they can't just 'snap out of' or whatnot, and that watching the speech would be legitimately harmful to them, independent of whether you think it's better on-net for society to cater to that) My actual current position was something like "it was good to move it to the end as an 'after-the-credits' scene, and good to describe it as optional, but because everyone was still seated and in some cases it was really awkward to move around, it didn't really feel like a live option, and it would have been better to do something like 'let people start to leave slightly before starting the bit so that people who wanted to keep moving out had an easier time doing so'"

How many people raised their hands when Eliezer asked about the probability estimate? When I was watching the video I gave a probability estimate of 65%, and I'm genuinely shocked that "not many" people thought he had over a 55% chance. This is Eliezer we're talking about.............

8Tenoke
He has been trying to do it for years and failed. The first time I read his attempts at doing that, years ago, I also assigned a high probability of success. Then 2 years passed and he hadn't done it, then another 2 years.. You have to adjust your estimates based on your observations.

I wonder if it negatively impacts the cohesiveness/teamwork ability of the resulting AI safety community by disproportionately attracting a certain type of person? It seems unlikely that everyone would enjoy this style

.

[This comment is no longer endorsed by its author]Reply

FWIW you can bet on some of these on PredictIt -- for example, Predictit assigns only a 47% chance Trump will win in 2020. That's not a huge difference, but still worth betting 5% of your bankroll (after fees) on if you bet half-Kelly. (if you want to bet with me for whatever reason, I'd also be willing to bet up to $700 that Trump doesn't win at PredictIt odds if I don't have to tie up capital)

We can test if the most popular books & music of 2019 sold less copies than the most popular books & music of 2009 (I might or might not look into this later)

GDP is 2x higher than in 2000

Why not use per capita real GDP (+25% since 2000)?

4Liron
Ah right yes, ok so not a huge increase. Meanwhile, I've seen real value of entry-level programmer compensation packages at least double since 2000, probably more like triple. I think my point about GDP growth helping the outside view is this: *some* significant chunks of sectors in the economy are getting significantly more productive. What kind of workers are producing more value? What are the characteristics of a job that enables more value creation? One where there's more leverage, i.e. an hour of work produces more economic value, without a corresponding increase in supply. And any sector where the leverage on time is increasing say 5x+ per decade is a good bet for an area where supply is trailing demand.

I'm thinking that if there were liquid prediction markets for amplifying ESCs, people could code bots to do exactly what John suggests and potentially make money. This suggests to me that there's no principled difference between the two ideas, though I could be missing something (maybe you think the bot is unlikely to beat the market?)

I think I'd feel differently about John's list if it contained things that weren't goodhartable, such as... I don't know, most things are goodhartable. For example, citation density does probably have an impact (not just a correlation) on credence score. But giving truth or credibility points for citations is extremely gameable. A score based on citation density is worthless as soon as it becomes popular because people will do what they would have anyway and throw some citations in on top. Popular authors may not even have to do that t... (read more)

Based on the quote from Jessica Taylor, it seems like the FDT agents are trying to maximize their long-term share of the population, rather than their absolute payoffs in a single generation? If I understand the model correctly, that means the FDT agents should try to maximize the ratio of FDT payoff : 9-bot payoff (to maximize the ratio of FDT:9-bot in the next generation). The algebra then shows that they should refuse to submit to 9-bots once the population of FDT agents gets high enough (Wolfram|Alpha link), without needing to drop the random encounter... (read more)

What's the difference between John's suggestion and amplifying ESCs with prediction markets? (not rhetorical)

2Elizabeth
I don't immediately see how they're related. Are you thinking people participating in the markets are answering based on proxies rather than truly relevant information?

I was somewhat confused by the discussion of LTFF grants being rejected by CEA; is there a public writeup of which grants were rejected?

habryka110

I don't think there is a public writeup. So here is a quick summary: 

  • Only one grant was ever rejected by CEA, which was the grant to Lauren Lee (after a round of logistical obstacles to the grant) with the reasoning of "it is outside of the historical scope of the fund"
  • The grant to give out HPMOR's to IMO winners was urgent and CEA couldn't complete the due-diligence process before the money was required, so a private donor ended up filling it before it was made
  • The $150k recommendation to CFAR was being reviewed by the CEA board, who had some concerns
... (read more)

In order to do this, the agent needs to be able to reason approximately about the results of their own computations, which is where logical uncertainty comes in

Why does being updateless require thinking through all possibilities in advance? Can you not make a general commitment to follow UDT, but wait until you actually face the decision problem to figure out which specific action UDT recommends taking?

7abramdemski
Sure, but what computation do you then do, to figure out what UDT recommends? You have to have, written down, a specific prior which you evaluate everything with. That's the problem. As discussed in Embedded World Models, a Bayesian prior is not a very good object for an embedded agent's beliefs, due to realizability/grain-of-truth concerns; that is, specifically because a Bayesian prior needs to list all possibilities explicitly (to a greater degree than, e.g., logical induction).

Well, it's been 8 years; how close are ML researchers to a "proto-AGI" with the capabilities listed? (embarassingly, I have no idea what the answer is)

4interstice
As far as I know no one's tried to build a unified system with all of those capacities, but we do seem to have rudimentary learned versions of each of the capacities on their own.

Apparently an LW user did a series of interviews with AI researchers in 2011, some of which included a similar question. I know most LW users have probably seen this, but I only found it today and thought it was worth flagging here.

What are the competing explanations for high time preference?

A better way to phrase my confusion: How do we know the current time preference is higher than what we would see in a society that was genuinely at peace?

The competing explanations I was thinking of were along the lines of "we instinctively prefer having stuff now to having stuff later"


Yeah, I was implicitly assuming that initiating a successor agent would force Omega to update its predictions about the new agent (and put the $1m in the box). As you say, that's actually not very relevant, because it's a property of a specific decision problem rather than CDT or son-of-CDT.

(I apologize in advance if this is too far afield of the intended purpose of this post)

How does the claim that "group agents require membranes" interact with the widespread support for dramatically reducing or eliminating restrictions to immigration ("open borders" for short) within the EA/LW community? I can think of several possibilities, but I'm not sure which is true:

  • There actually isn't much support for open borders
  • Open borders supporters believe that "group agents require membranes" is a reasonable generaliatio
... (read more)
5Raemon
This definitely seems like a relevant application of the question (different specific topic that I was thinking about when writing the OP, but the fully fleshed out theory I was aiming at would have some kind of answer here) My off-the-cuff-made-up-guess is that there is some threshold of borders that matters for group agency, but if your borders are sufficiently permeable already, there's not much benefit to marginally increasing them. Like, I notice that I don't feel much benefit to the US having stronger borders because it's already a giant melting pot. But I have some sense that Japan actually benefits from group homogeneity, by getting to have some cultural tech that is impossible in melting-pot cities.  (I notice that I am pretty confused about how to resolve some conflicted principles, like, I do in fact think there's a benefit to open borders for general "increased trade and freedom has a bunch of important benefits" reasons, and I also think there are benefits to careful standards for group membership, and I don't know how to trade those off. I think that a lot of cultural benefits are really hard to pull off) I think the people who are anti open borders are thinking roughly in terms of "we can't coordinate on having particular principles if anyone can just show up whenever" (although many of them don't think through any explicit game theory).

Would trying to become less confused about commitment races before building a superintelligent AI count as a metaphilosophical approach or a decision theoretic one (or neither)? I'm not sure I understand the dividing line between the two.

5Wei Dai
Trying to become less confused about commitment races can be part of either a metaphilosophical approach or a decision theoretic one, depending on what you plan to do afterwards. If you plan to use that understanding to directly give the AI a better decision theory which allows it to correctly handle commitment races, then that's what I'd call a "decision theoretic approach". Alternatively, you could try to observe and understand what humans are doing when we're trying to become less confused about commitment races and program or teach an AI to do the same thing so it can solve the problem of commitment races on its own. This would be an example of what I call "metaphilosophical approach".
if you're interested in anything in particular, I'll be happy to answer.

I very much appreciate the offer! I can't think of anything specific, though; the comments of yours that I find most valuable tend to be "unknown unknowns" that suggest a hypothesis I wouldn't previously have been able to articulate.

Have you written anything like "cousin_it's life advice"? I often find your comments extremely insightful in a way that combines the best of LW ideas with wisdom from other areas, and would love to read more.

5cousin_it
Thank you! Not sure about writing a life advice column, that's not quite who I am, but if you're interested in anything in particular, I'll be happy to answer.
The prior probability ratio is 1:99, and the likelihood ratio is 20:1, so the posterior probability is 120:991 = 20:99, so you have probability of 20/(20+99) of having breast cancer.

What does "120:991" mean here?

4Buck
formatting problem, now fixed

After thinking about it some more, I don't think this is true.

A concrete example: Let's say there's a CDT paperclip maximizer in an environment with Newcomb-like problems that's deciding between 3 options.

1. Don't hand control to any successor

2. Hand off control to a "LDT about correlations formed after 7am, CDT about correlations formed before 7am" successor

3. Hand off control to a LDT successor.

My understanding is that the CDT agent would take the choice that causes the highest number of paperclips to be created (in ... (read more)

[This comment is no longer endorsed by its author]Reply
5Rob Bensinger
This is true if we mean something very specific by "causes". CDT picks the action that would cause the highest number of paperclips to be created, if past predictions were uncorrelated with future events. If an agent can arbitrarily modify its own source code ("precommit" in full generality), then we can model "the agent making choices over time" as "a series of agents that are constantly choosing which successor-agent follows them at the next time-step". If Son-of-CDT were the same as LDT, this would be the same as saying that a self-modifying CDT agent will rewrite itself into an LDT agent, since nothing about CDT or LDT assigns special weight to actions that happen inside the agent's brain vs. outside the agent's brain.

That makes sense to me, but unfortunately I'm no closer to understanding the quoted passage. Some specific confusions:

  • What's the link between death rate and time preference? My best guess is that declining life expectancy implies scarcity, but I also don't get....
  • the link between scarcity and time preference? My best guess is that high time preference means people don't put the work in to ensure sufficient future productive capacity, but that doesn't help me understand the quote so I think I'm missing something.
  • I get why emer
... (read more)
3Benquo
Declining life expectancy suggests a general increase in scarcity. If the processes around you are trying to make things more scarce for you rather than less, then you're in something more like a conflict relation than a trade relation to them, and delayed gratification is much less feasible in wartime or other emergencies where you're likely to die if you don't get something done NOW. What are the competing explanations for high time preference?

Can someone explain/point me to useful resources to understand the idea of time preference as expresed in this post? In particular, I'm struggling to understand these sentences:

This suggests that near the center time preference has increased to the point where we’re creating scarcity faster than we’re alleviating it, while at the periphery scarcity is still actually being alleviated because there’s enough scarcity to go around, or perhaps marginal areas do not suffer so much from total mobilization.

I also don't understand ... (read more)

2Benquo
IRR, discount rate, and effective time preference are really the same thing as expressed in related domains.

I think quantitative easing is an example (if I understood the post correctly, which I'm not sure about). By buying up bonds, the government is putting more dollars into the economy, which reduces the "amount of stuff produced per dollar", thus creating scarcity (in other words, QE increases aggregate demand). To alleviate this pressure, people make more stuff in order to meet the excess demand (i.e. unemployment rates go down). Forcing the unemployment rate down is the same as "requiring almost everyone to do things"

Maybe the claim that climate scientists are liars? I don't know if it's true, but if I knew it were false I'd definitely downvote the post...

I understand that, but I don't see why #2 is likely to be achievable. Corrigibility seems very similar to Wei Dai's translation example, so it seems like there could be many deceptive actions that humans would intuitively recognize as not corrigible, but which would fool an early-stage LBO tree into assigning a high reward. This seems like it would be a clear example of "giving a behaviour a high reward because it is bad". Unfortunately I can't think of any good examples, so my intuition may simply be mistaken.

Incidentally, it see... (read more)

4William_S
For factored cognition: I think the reason #2 might be achievable relies on assuming that there is some reason that the bad features is selected in the first place and is assigned high reward. For example, this might have happened because the agent ran a simulation forward, and then realized that if they punch the human they can take the reward button from them. The hope is that we can figure out that the simulation process happened and why it lead to the wrong thing (or outlaw simluations of this form in the first place). For factored evaluation, I think the story is a bit different (relying on the other expert being able to understand the reasons for the reward assignment and point it out to the judge, but I don't think the judge needs to be able to find it on there own). No plans currently, but it would be interesting.

That makes sense; so it's a general method that's applicable whenever the bandwidth is too low for an individual agent to construct the relevant ontology?

2William_S
Yeah, that's my current picture of it.
plus maybe other properties

That makes sense; I hadn't thought of the possibility that a security failure in the HBO tree might be acceptable in this context. OTOH, if there's an input that corrupts the HBO tree, isn't it possible that the corrupted tree could output a supposed "LBO overseer" that embeds the malicious input and corrupts us when we try to verify it? If the HBO tree is insecure, it seems like a manual process that verifies its output must be insecure as well.

2William_S
One situation is: maybe an HBO tree of size 10^20 runs into a security failure with high probability, but an HBO tree of size 10^15 doesn't and is sufficient to output a good LBO overseer.

I don't understand the argument that a speed prior wouldn't work: wouldn't the abstract reasoner still have to simulate the aliens in order to know what output to read from the zoo earths? I don't understand how "simulate a zoo earth with a bitstream that is controlled by aliens in a certain way" would ever get a higher prior weight than "simulate an earth that never gets controlled by aliens". Is the idea that each possible zoo earth with simple-to-describe aliens has a relatively similar prior weight to the real earth, so they collectively have a much higher prior weight?

I think it’s likely that these markets would quickly converge to better predictions than existing political prediction markets

Why would you expect this to be true? I (and presumably many others) spend a lot of time researching questions on existing political prediction markets because I can win large sums ($1k+ per question) doing so. I don't see why anyone would have an incentive to put in a similar amount of time to win Internet Points, and as a result I don't see why these markets would outperform existing political prediction markets... (read more)

1Matthew Barnett
I think realistically, the expected value is much lower than $1k per question (on Predictit, for instance), unless you can beat the market by a very substantial margin.

Is there any information on how Von Neumann came to believe Catholicism was the correct religion for Pascal Wager purposes? "My wife is Catholic" doesn't seem like very strong evidence...

3Eli Tyre
I don't know why Catholicism. I note that it does seem to be the religion of choice for former atheists, or at least for rationalists. I know of several rationalists that converted to catholicism, but none that have converted to any other religion.

How do you ensure that property #3 is satisfied in the early stages of the amplification process? Since no agent in the tree will have context, and the entire system isn't very powerful yet, it seems like there could easily be inputs that would naively generate a high reward "by being bad", which the overseer couldn't detect.

1William_S
Suppose an action is evaluated as a linear combination of a set of human interpretable features. The action "punch the human" could be selected because 1) many of the reward weights of these features could be wrong, or it could be selected because 2) there is one feature "this action prevents the human from turning me off" that is assigned high reward. I think the thing we'd want to prevent in this case is 2) but not 1), and I think that's more likely to be achievable.

From an epistemic rationality perspective, isn't becoming less aware of your emotions and body a really bad thing? Not only does it give you false beliefs, but "not being in touch with your emotions/body" is already a stereotyped pitfall for a rationalist to fall into...


5Kaj_Sotala
Definitely. But note that according to the paper, the stress thing “was observed in a total of three participants”; he says that he then went on to “went on to conduct other experiments” and found results among similar lines, and then gives the yoga and racism examples. So it’s not clear to me exactly how many individuals had that kind of a disconnect between their experience of stress and their objective level of stress; 3/50 at least sounds like a pretty small minority. I'm intending to flesh out my model further in a future post, but the short version is that I don't believe the loss of awareness to be an inevitable consequence of all meditation systems - though it is probably a real risk with some. Metaphorically, there are several paths that lead to enlightenment and some of them run the risk of reducing your awareness, but it seems to me entirely possible to take safer paths.

Is meta-execution HBO, LBO, or a general method that could be implemented with either? (my current credences: 60% LBO, 30% general method, 10% HBO)

3William_S
I think it's a general method that is most applicable in LBO, but might still be used in HBO (eg. an HBO overseer can read one chapter of a math textbook, but this doesn't let it construct an ontology that let's it solve complicated math problems, so instead it needs to use meta-execution to try to manipulate objects that it can't reason about directly.

How does this address the security issues with HBO? Is the idea that only using the HBO system to construct a "core for reasoning" reduces the chances of failure by exposing it to less inputs/using it for less total time? I feel like I'm missing something...

5William_S
I'd interpreted it as "using the HBO system to construct a "core for reasoning" reduces the chances of failure by exposing it to less inputs/using it for less total time", plus maybe other properties (eg. maybe we could look at and verify an LBO overseer, even if we couldn't construct it ourselves)

.

[This comment is no longer endorsed by its author]Reply

Yep, I misread the page, my mistake

and from my perspective this is a good thing, because it means we've made moral progress as a society.

I know this is off-topic, but I'm curious how you would distinguish between moral progress and "moral going-in-circles" (don't know what the right word is)?

1AnthonyC
I don't know. In practice, I don't think I do. On the one hand, I look over what I know about the last few thousand years of history and find that the farther back I go, the more horrible many of the people that were, in their own time, considered saintly, mainly seem, like St Augustine. On another hand, I have the most famous moral teachers of history from Jesus and the Buddha and Mohammed and Confucius and so on, and I feel like as a society we have been grappling with the same handful of basic underlying moral principles for a really long time. And on yet another hand, I have Robin Hanson's discussion of forager and farmer values arguing for cyclic trends on an even longer timescale. I'm sure I can find a few more hands besides. If I had to give a more concrete answer, I might go with something like this: over time we try to individually and collectively reconcile our moral intuitions and ethical precepts with the actual world we live in, while at the same time we're developing better methods of evaluating arguments and evidence to reduce mistakes in thinking. We keep finding contradictions in the practices we inherited, and look for ways to resolve them, and so on average those discrepancies will decrease with time. For the past few thousand years, despite huge oscillations and real losses, there seems to me to be a general trend in some overall direction that involves greater wealth, more options for individuals to choose their own lives, and capacity for cooperation among strangers across larger distances. So I think, if you sent me forward a hundred years, that once I got over the shock and started to understand the new world I was in, I'd be able to look at morally significant changes and consistently evaluate them as gains vs losses, even the ones the intuitively horrify my early 21st century expectations.

(Keeping in mind that I have nothing to do with the inquiry and can't speak for OP)

Why is it desirable for the inquiry to turn up a representative sample of unpopular beliefs? If that were explicitly the goal, I would agree with you; I'd also agree (?) that questions with that goal shouldn't be allowed. However, I thought the idea was to have some examples of unpopular opinions to use in a separate research study, rather than to directly research what unpopular beliefs LW holds.

If the conclusion of the research turns out to be "here is... (read more)

4Dagon
Heh. It's interesting to even try to define what "representative" means for something that is defined by unpopularity. I guess the best examples are those that are so reprehensible or ludicrous that nobody is willing to even identify them. I do understand your reluctance to give any positive feedback to an idea you abhor, even when it's relevant and limited to one post. I look forward to seeing what results from it - maybe it will move the window, as you seem to fear. Maybe it'll just be forgotten, as I expect.
4Zack_M_Davis
Okay, that makes sense.

I downvoted because I think the benefit of making stuff like this socially unacceptable on LW is higher than the cost of the OP getting one less response to their survey. The reasons it might be " strong-downvote-worthy had it appeared in most other possible contexts" still apply here, and the costs of replacing it with a less-bad example seem fairly minimal.

6Dagon
I've upvoted them because I think they are specifically appropriate and on-topic for this post, even though I agree that they'd be unwelcome on most of LW. When discussing (or researching) contrarian and unpopular ideas, it's a straight-up mistake (selection and survivorship bias) to limit those ideas to only the semi-contrarian ones that fit into the site's general https://en.wikipedia.org/wiki/Overton_window .
9Zack_M_Davis
Can you elaborate? I think the costs (in the form of damaging the integrity of the inquiry) are quite high. If you're going to crowdsource a list of unpopular beliefs, and carry out that job honestly, then the list is inevitably going to contain a lot of morally objectionable ideas. After all, being morally objectionable is a good reason for an idea to be unpopular! (I suppose the holders of such ideas might argue that the causal relationship between unpopularity and perception-of-immorality runs in the other direction, but we don't care what they think.) Now, I also enjoy our apolitical site culture, which I think reflects an effective separation of concerns: here, we talk aboout Bayesian epistemology. When we want to apply our epistemology skills to contentious object-level topics that are likely to generate "more heat than light", we take it to someone else's website. (I recommend /r/TheMotte.) That separation is a good reason to explicitly ban specific topics or hypotheses as being outside of the site's charter. But if we do that, then we can't compile a list of unpopular beliefs without lying about the results. Blatant censorship is the best kind!
5defilippis
Agreed. There’s no value in spreading this opinion

I think the US is listed because it's mandatory that we register for the draft

[This comment is no longer endorsed by its author]Reply
4Said Achmiz
No such speculation is necessary; you need only to, you know, read the page, to see that the list is simply a list of countries with national service, period—whether compulsory or voluntary.
Euthanasia should be a universal right.

This doesn't sound non-normative at all?

1paul ince
It was just legalised in Western Australia. The second Australian state to do so.
3Davis_Kingsley
I can tell you that I at least found it abhorrent! :P

My current best-guess answer for what "HCH + annotated functional programming" and no indirection is:

Instead of initializing the tree with the generic question "what should the agent do next", you initialize the tree with the specific question you want an answer for. In the context of IDA, I think (??) this would be a question sampled from the distribution of questions you want the IDA agent to be able to answer well.

Is it fair to say the HCH + AFP part mainly achieves capability amplification, and the indirection part mainly achieves ... (read more)

Huh, I thought that all amplification/distillation procedures were intended as a way to approximate HCH, which is itself a tree. Can you not meaningfully discuss "this amplification procedure is like an n-depth approximation of HCH at step x", for any amplification procedure?

For example, the internal structure of the distilled agent described in Christiano's paper is unlikely to look anything like a tree. However, my (potentially incorrect?) impression is that the agent's capabilities at step x are identical to an HCH tree of depth x i... (read more)

4Rohin Shah
No, you can't. E.g. If your amplification procedure only allows you to ask a single subagent a single question, that will approximate a linear HCH instead of a tree-based HCH. If your amplification procedure doesn't invoke subagents at all, but instead provides more and more facts to the agent, it doesn't look anything like HCH. The canonical implementations of iterated amplification are trying to approximate HCH though. That sounds right to me.

Huh, what would you recommend I do to reduce my uncertainty around meta-execution (e.g. "read x", "ask about it as a top level question", etc)?

3Rohin Shah
I'm not sure how relevant meta-execution is any more, I haven't seen it discussed much recently. So probably you'd want to ask Paul, or someone else who was around earlier than I was.

Is this necessarily true? It seems like this describes what Christiano calls "delegation" in his paper, but wouldn't apply to IDA schemes with other capability amplification methods (such as the other examples in the appendix of "Capability Amplification").

3Rohin Shah
"Depth" only applies to the canonical tree-based implementation of IDA. If you slot in other amplification or distillation procedures, then you won't necessarily have "depths" any more. You'll still have recursion, and that recursion will lead to more and more capability. Where it ends up depends on your initial agent and how good the amplification and distillation procedures are.
Load More