The below is the draft of a blog post I have about why I like AI doom liability. My dream is that people read it and decide "ah yes this is the main policy we will support" or "oh this is bad for a reason Daniel hasn't noticed and I'll tell him why". I think usually you're supposed to flesh out posts, but I'm not sure that adds a ton of information in this case.
AI doom liability is my favourite approach to AI regulation. I want to sell you all on it.
the basic idea
Hot take: if you think that we'll have at least 30 more years of future where geopolitics and nations are relevant, I think you should pay at least 50% as much attention to India as to China. Similarly large population, similarly large number of great thinkers and researchers. Currently seems less 'interesting', but that sort of thing changes over 30-year timescales. As such, I think there should probably be some number of 'India specialists' in EA policy positions that isn't dwarfed by the number of 'China specialists'.
For comparison, in a universe where EA existed 30 years ago we would have thought it very important to have many Russia specialists.
The Indian grammarian Pāṇini wanted to exactly specify what Sanskrit grammar was in the shortest possible length. As a result, he did some crazy stuff:
Pāṇini's theory of morphological analysis was more advanced than any equivalent Western theory before the 20th century. His treatise is generative and descriptive, uses metalanguage and meta-rules, and has been compared to the Turing machine wherein the logical structure of any computing device has been reduced to its essentials using an idealized mathematical model.
There are two surprising facts about this:
I've been obsessing about this for the last few days.
A complaint about AI pause: if we pause AI and then unpause, progress will then be really quick, because there's a backlog of improvements in compute and algorithmic efficiency that can be immediately applied.
One definition of what an RSP is: if a lab makes observation O, then they pause scaling until they implement protection P.
Doesn't this sort of RSP have the same problem with fast progress after pausing? Why have I never heard anyone make this complaint about RSPs? Possibilities:
Basically I just agree with what James said. But I think the steelman is something like: you should expect shorter (or no) pauses with an RSP if all goes well, because the precautions are matched to the risks. Like, the labs aim to develop safety measures which keep pace with the dangers introduced by scaling, and if they succeed at that, then they never have to pause. But even if they fail, they're also expecting that building frontier models will help them solve alignment faster. I.e., either way the overall pause time would probably be shorter?
It does seem like in order to not have this complaint about the RSP, though, you need to expect that it's shorter by a lot (like by many months or years). My guess is that the labs do believe this, although not for amazing reasons. Like, the answer which feels most "real" to me is that this complaint doesn't apply to RSPs because the labs aren't actually planning to do a meaningful pause.
Good point!
Man, my model of what's going on is:
...and these, taken together, should explain it.
A theory of how alignment research should work
(cross-posted from danielfilan.com)
Epistemic status:
Maybe obvious to everyone but me, or totally wrong (this doesn't really grapple with the challenges of working in a domain where an intelligent being might be working against you), but:
I agree that we probably want most theory to be towards the applied end these days due to short timelines. Empirical work needs theory in order to direct it, theory needs empirics in order to remain grounded.
I continue to think that agent foundations research is kind of underrated. Like, we're supposed to do mechinterp to understand the algorithm models implement - but how do we know what algorithms are good?
It additionally seems likely to me that we are presently missing major parts of a decent language for talking about minds/models, and developing such a language requires (and would constitute) significant philosophical progress. There are ways to 'understand the algorithm a model is' that are highly insufficient/inadequate for doing what we want to do in alignment — for instance, even if one gets from where interpretability is currently to being able to replace a neural net by a somewhat smaller boolean (or whatever) circuit and is thus able to translate various NNs to such circuits and proceed to stare at them, one probably won't thereby be more than of the way to the kind of strong understanding that would let one modify a NN-based AGI to be aligned or build another aligned AI (in case alignment doesn't happen by default) (much like how knowing the weights doesn't deliver that kind of understanding). To even get to the point where we can usefully understand the 'algorithms' models implement, I feel like we might need to have answered sth like (1) what kind of syntax should we see thinking as having — for example, should we think of a model/mind as a library of small prog...
Shower thought[*]: the notion of a task being bounded doesn't survive composition. Specifically, say a task is bounded if the agent doing it is only using bounded resources and only optimising a small bit of the world to a limited extent. The task of 'be a human in the enterprise of doing research' is bounded, but the enterprise of research in general is not bounded. Similarly, being a human with a job vs the entire human economy. I imagine keeping this in mind would be useful when thinking about CAIS.
Similarly, the notion of a function being interpretable doesn't survive composition. Linear functions are interpretable (citation: the field of linear algebra), as is the ReLU function, but the consensus is that neural networks are not, or at least not in the same way.
I basically wish that the concepts that I used survived composition.
[*] Actually I had this on a stroll.
Frankfurt-style counterexamples for definitions of optimization
In "Bottle Caps Aren't Optimizers", I wrote about a type of definition of optimization that says system S is optimizing for goal G iff G has a higher value than it would if S didn't exist or were randomly scrambled. I argued against these definitions by providing a examples of systems that satisfy the criterion but are not optimizers. But today, I realized that I could repurpose Frankfurt cases to get examples of optimizers that don't satisfy this criterion.
A Frankfurt case is a thought experiment designed to disprove the following intuitive principle: "a person is morally responsible for what she has done only if she could have done otherwise." Here's the basic idea: suppose Alice is considering whether or not to kill Bob. Upon consideration, she decides to do so, takes out her gun, and shoots Bob. But little-known to her, a neuroscientist had implanted a chip in her brain that would have forced her to shoot Bob if she had decided not to. That said, the chip didn't activate, because she did decide to shoot Bob. The idea is that she's morally responsible, even tho she couldn't have done otherwise.
Anyway, let's do this w...
Live in Berkeley? I think you should consider running for the city council. Why?
On the most recent episode of the podcast Rationally Speaking, David Shor discusses how members of the USA's Democratic Party could perform better electorally by not talking about their unpopular extreme views, but notes that many individual Democrats have better lives by talking about their unpopular extreme views that are popular with left-wing activists (e.g. because they become more prominent and get to feel good about themselves), which cause some voters to associate those unpopular extreme views with the Democratic Party and not vote for them.
This is discussed as a sad irrationality that constitutes a coordination failure among Democrats, but I found that an odd tone. Part of the model in the episode is that Democratic politicians in fact have these unpopular extreme views, but it would hurt their electoral chances if that became known. From a non-partisan perspective, you'd expect it to be a good thing to know what elected officials actually think. Now, you might think that elected officials shouldn't enact the unpopular policies that they in fact believe in, but it's odd to me that they apparently can't credibly communicate that they won't enact those policies. At any rate, I'm a bit bothered by the idea of coordinated silence to ensure that people don't know what powerful people actually think being portrayed as good.
I think the use of dialogues to illustrate a point of view is overdone on LessWrong. Almost always, the 'Simplicio' character fails to accurately represent the smart version of the viewpoint he stands in for, because the author doesn't try sufficiently hard to pass the ITT of the view they're arguing against. As a result, not only is the dialogue unconvincing, it runs the risk of misleading readers about the actual content of a worldview. I think this is true to a greater extent than posts that just state a point of view and argue against it, because the dialogue format naively appears to actually represent a named representative of a point of view, and structurally discourages disclaimers of the type "as I understand it, defenders of proposition P might state X, but of course I could be wrong".
A bunch of my friends are very skeptical of the schooling system and promote homeschooling or unschooling as an alternative. I see where they're coming from, but I worry about the reproductive consequences of stigmatising schooling in favour of those two alternatives. Based on informal conversations, the main reason why people I know aren't planning on having more children is the time cost. A move towards normative home/unschooling would increase the time cost of children, and as such make them less appealing to prospective parents[*]. This in turn would reduce birth rates, worsening the problem that first-world countries face in the next couple of decades of a low working-age:elderly population ratio [EDIT: also, low population leading to less innovation, also low population leading to fewer people existing who get to enjoy life]. As such, I tentatively wish that home/unschooling advocates would focus on more institutional ways of supervising children, e.g. Sudbury schools, community childcare, child labour [EDIT: or a greater emphasis on not supervising children who don't need supervision, or similar things].
[*] This is the weakest part of my argument - it's possible that more people home/unschooling their kids would result in cooler kids that were more fun to be around, and this effect would offset the extra time cost (or kids who are more willing to support their elderly parents, perhaps). But given how lucrative the first world labour market is, I doubt it.
I often see (and sometimes take part in) discussion of Facebook here. I'm not sure whether when I partake in these discussions I should disclaim that my income is largely due to Good Ventures, whose money largely comes from Facebook investments. Nobody else does this, so shrug.
Why I am less than infinitely hostile to the time / bloomberg pieces:
One result that's related to Aumann's Agreement Theorem is that if you and I alternate saying our posterior probabilities of some event, we converge on the same probability if we have common priors. You might therefore wonder why we ever do anything else. The answer is that describing evidence is strictly more informative than stating one's posterior. For instance, imagine that we've both secretly flipped coins, and want to know whether both coins landed on the same side. If we just state our posteriors, we'll immediately converge to 50%, without actually learning the answer, which we could have learned pretty trivially by just saying how our coins landed. This is related to the original proof of the Aumann agreement theorem in a way that I can't describe shortly.
Models and considerations.
There are two typical ways of deciding whether on net something is worth doing. The first is to come up with a model of the relevant part of the world, look at all the consequences of doing the thing in the model, and determine if those consequences are net positive. When this is done right, the consequences should be easy to evaluate and weigh off against each other. The second way is to think of a bunch of considerations in favour of and against doing something, and decide whether the balance of considerations supports doing the thing or not.
I prefer model-building to consideration-listing, for the following reasons:
Hot take: the norm of being muted on video calls is bad. It makes it awkward and difficult to speak, clap, laugh, or make "I'm listening" sounds. A better norm set would be:
As far as I can tell, people typically use the orthogonality thesis to argue that smart agents could have any motivations. But the orthogonality thesis is stronger than that, and its extra content is false - there are some goals that are too complicated for a dumb agent to have, because the agent couldn't understand those goals. I think people should instead directly defend the claim that smart agents could have arbitrary goals.
A rough and dirty estimate of the COVID externality of visiting your family in the USA for Christmas when you don't feel ill [EDIT: this calculation low-balls the externality, see below]:
You incur some number of μCOVIDs[*] a week, let's call it x. Since the incubation time is about 5 days, let's say that your chance of having COVID is about 5x/7,000,000 when you arrive at the home of your family with n other people. In-house attack rate is about 1/3, I estimate based off hazy recollections, so in expectation you infect 5xn/21,000,000 people, which is about...
Better to concretise 3 ways than 1 if you have the time.
Here's a tale I've heard but not verified: in the good old days, Intrade had a prediction market on whether Obamacare would become law, which resolved negative, due to the market's definition of Obamacare.
Sometimes you're interested in answering a vague question, like 'Did Donald Trump enact a Muslim ban in his first term' or 'Will I be single next Valentine's day'. Standard advice is to make the question more specific and concrete into something that can be more objectively evaluated. I think that th
...This weekend, I looked up Benquo's post on zetetic explanation in order to nominate it for the 2019 review. Alas, it was posted in 2018, and wasn't nominated for that year's review. Nevertheless, I've recently gotten interested in amateur radio, and have noticed that the mechanistic/physical explanations of radio waves and such that I've come across while studying for exams are not really sufficient to empower me to actually get on the radio, and more zetetic explanations are useful, altho harder to test. Anyway, I recommend re-reading the post.
My bid for forecasters: come up with conditional prediction questions to forecast likely impacts of potential US policies towards Ukraine. See this thread where I brainstorm potential such questions.
Challenges as I see it: figuring out which policies are live options, operationalizing, and figuring out good success/failure metrics.
Benefits: potentially make policy more sane, or more realistically practice doing the sort of thing that might one day make policy more sane.
Ted Kaczynski as a relatively apolitical test case for cancellation norms:
Ted Kaczynski was a mathematics professor who decided that industrial society was terrible, and waged a terroristic bombing campaign to foment a revolution against technology. As part of this campaign, he wrote a manifesto titled "Industrial Society and Its Future" and said that if a major newspaper printed it verbatim he would desist from terrorism. He is currently serving eight life sentences in a "super-max" security prison in Colorado.
My understanding is that his manifesto (which...
Generally speaking, if someone commits heinous and unambiguous crimes in service of an objective like "getting people to read X", and it doesn't look like they're doing a tricky reverse-psychology thing or anything like that, then we should not cooperate with that objective. If Kaczynski had posted his manifesto on LessWrong, I would feel comfortable deleting it and any links to it, and I would encourage the moderator of any other forum to do the same under those circumstances.
But this is a specific and unusual circumstance. When people try to cancel each other, usually there's no connection or a very tenuous connection between their writing and what they're accused of. (Also the crime is usually less severe and less well proven.) In that case, the argument is different; either the people doing the cancelling think that the crime wasn't adequately punished, and are trying to create justice via a distributed minor punishment. If people are right about whether the thing is bad, then the main issues are about standards of evidence (biased readings and out-of-context quotes go a long way), proportionality (it's not worth blowing up peoples' lives over having said something dumb on the internet), and relation to nonpunishers (problems happen when things escalate from telling people why someone is bad, to punishing people for not believing or not caring).
I made this post with the intent to write a comment, but the process of writing the comment out made it less persuasive to me. The planning fallacy?
Here's a script I wrote to analyze how good Manifold Markets is at predicting Ukraine stuff. Basically: it's about as good as you would be if you were calibrated at 80% accuracy if you average market prices over the life of the market, and if you take the probabilities at the mid-point of the market, it's about as good as you would be if you were calibrated at 72% accuracy.
Some puzzles:
These seem like they should be related, but I don't quite know how. Maybe if someone thought about it for an hour they could figure it out.
Quantitative claims about code maintenance from Working in Public, plausibly relevant to discussion of code rot and machine intelligence:
Here's a project idea that I wish someone would pick up (written as a shortform rather than as a post because that's much easier for me):
This is a fun Aumann paper that talks about what players have to believe to be in a Nash equilibrium. Here, instead of imagining agents randomizing, we're instead imagining that the probabilities over actions live in the heads of the other agents: you might well know exactly what you're going to do, as long as I don't. It shows that in 2-player games, you can write down conditions that involve mutual knowledge but not common knowledge that imply that the players are at a Nash equilibrium: mutual knowledge of player's conjectures about each other, players' ...
Let it be known: I'm way more likely to respond to (and thereby algorithmically signal-boost) criticisms of AI doomerism that I think are dumb than those that I think are smart, because the dumb objections are easier to answer. Caveat emptor.
An attempt at rephrasing a shard theory critique of utility function reasoning, while restricting myself to things I basically agree with:
Yes, there are representation theorems that say coherent behaviour is optimizing some utility function. And yes, for the sake of discussion let's say this extends to reward functions in the setting of sequential decision-making (even tho I don't remember seeing a theorem for that). But: just because there's a mapping, doesn't mean that we can pull back a uniform measure on utility/reward functions to get a reasonable mea...
Here are two EA-themed podcasts that I think someone could make. Maybe that someone is you!
More or Less is a BBC Radio program. They take some number that's circulating around the news, and provide context like "Is that literally true? How could someone know that? What is that actually measuring? Is that a big number? Does that mean what you think it means?" - stuff like that. They spend about 10 minutes on each number, and usually include interviews with experts in the field. IMO, someone could do this for numb...
A sad fact is that good methods to elicit accurate probabilities of the outcome of some future process, e.g. who will win the next election, give you an incentive to influence that outcome, e.g. by campaigning and voting for the candidate you said was more likely to win. But with mind uploading and the 'right' theory of personal identity, we can fix this!
First, suppose that you think of all psychological descendants of your current self as 'you', but you don't think of descendants of your past self as 'you'. So, if you were to make a copy of yourself tomor...
Suppose there are two online identities, and you want to verify that they're associated with the same person. It's not too hard to verify this: for instance, you could tell one of them something secretly, and ask the other what you told the first. But how do you determine that two online identities are different people? It's not obvious how you do this with anything like cryptographic keys etc.
One way to do it if the identities always do what's causal-decision-theoretically correct is to have the two identities play a prisoner's dilemma with each other, an...
Blog post request: a summary of all the UFO stuff and what odds I should put on alien visitations of earth.
'Seminar' announcement: me talking quarter-bakedly about products, co-products, deferring, and transparency. 3 pm PT tomorrow (actually 3:10 because that's how time works at Berkeley).
I was daydreaming during a talk earlier today (my fault, the talk was great), and noticed that one diagram in Dylan Hadfield-Menell's off-switch paper looked like the category-theoretic definition of the product of two objects. Now, in category theory, the 'opposite' of a product is a co-product, which in set theory is the disjoint union. So if the product of two actions is d...
Avoid false dichotomies when reciting the litany of Tarski.
Suppose I were arguing about whether it's morally permissible to eat vegetables. I might stop in the middle and say:
If it is morally permissible to eat vegetables, I desire to believe that it is morally permissible to eat vegetables. If it is morally impermissible to eat vegetables, I desire to believe that it is morally impermissible to eat vegetables. Let me not become attached to beliefs I may not want.
But this ignores the possibility that it's neither morally permissible nor morally impermi...
An interesting tension: it's kind of obvious from a micro-econ view that group houses should have Pigouvian taxes on uCOVIDs[*] (where I pay housemates for the chance I get them sick) rather than caps on how many uCOVIDs everyone can incur per week - and of course both of these are better than "just sort of be reasonable" or having no system. But uCOVID caps are nice in that they make it significantly easier to coordinate with other houses - it's much easier to figure out how risky interacting with somebody is when they can just tell you their cap, rather ...
FYI: I am not using the dialogue matching feature. If you want to dialogue with me, your best bet is to ask me. I will probably say no, but who knows.
Research project idea: formalize a set-up with two reinforcement learners, each training the other. I think this is what's going on in baby care. Specifically: a baby is learning in part by reinforcement learning: they have various rewards they like getting (food, comfort, control over environment, being around people). Some of those rewards are dispensed by you: food, and whether you're around them, smiling and/or mimicking them. Also, you are learning via RL: you want the baby to be happy, nourished, rested, and not cry (among other things). And the baby...
An argument for stock-picking:
I'm not sure whether I can pick stocks better than the market. But if I can, then money is more valuable to me in that world, since I have better-than-market opportunities in that world but only par-with-market opportunities in the EMH world. So I should buy stocks that look good to me, at least for a while, to check whether I'm in the world where I can do that, because it's a transfer from a world where money is less valuable to me to one where money is more valuable.
I think this argument goes thru if you assume market returns are equal in both worlds, which I think I think.
Results from an experiment I just found about inside vs outside view thinking (but haven't read the actual study, just the abstract: beware!)
Contrary to expectation, participants who assigned more importance to inside factors estimated longer completion times, and participants who gave greater weight to outside factors showed higher degrees of confidence in their estimates.
Excerpts from a FB comment I made about the principle of charity. Quote blocks are a person that I'm responding to, not me. Some editing for coherence has been made. tl;dr: it's fine to conclude that people are acting selfishly, and even to think that it's likely that they're acting selfishly on priors regarding the type of situation they're in.
The essence of charitable discourse is assuming that even your opponents have internally coherent and non-selfish reasons for what they do.
If this were true, then one shouldn't engage in charitable discourse. P...
A failure of an argument against sola scriptura (cross-posted from Superstimulus)
Recently, Catholic apologist Joe Heschmeyer has produced a couple of videos arguing against the Protestant view of the Bible - specifically, the claims of Sola Scriptura and Perspicuity (capitalized because I'll want to refer to them as premises later). "Sola Scriptura" has been operationalized a few different ways, but one way that most Protestants would agree on is (taken from the Westminster confession):
...The whole counsel of God, concerning all things necessary for [...] m
Rationality-related writings that are more comment-shaped than post-shaped. Please don't leave top-level comments here unless they're indistinguishable to me from something I would say here.