I can also come up with a story where obviously it's cheaper and more effective to disable all of the nuclear weapons than it is to take over the world, so why would the AI do the second thing?
Erm... For preventing nuclear war on the scale of decades... I don't know what you have in mind for how it would disable all the nukes, but a one-off breaking of all the firing mechanisms isn't going to work. They could just repair/replace that once they discovered the problem. You could imagine some more drastic thing like blowing up the conventional explosives on the missiles so as to utterly ruin them, but in a way that doesn't trigger the big chain reaction. But my impression is that, if you have a pile of weapons-grade uranium, then it's reasonably simple to make a bomb out of it, and since uranium is an element, no conventional explosion can eliminate that from the debris. Maybe you can melt it, mix it with other stuff, and make it super-impure?
But even then, the U.S. and Russia probably have stockpiles of weapons-grade uranium. I suspect they could make nukes out of that within a few months. You would have to ruin all the stockpiles too.
And then there's the possibility of mining more uranium and enriching it; I feel like this would take a few years at most, possibly much less if one threw a bunch of resources into rushing it. Would you ruin all uranium mines in the world somehow?
No, it seems to me that the only ways to reliably rule out nuclear war involve either using overwhelming physical force to prevent people from using or making nukes (like a drone army watching all the uranium stockpiles), or being able to reliably persuade the governments of all nuclear powers in the world to disarm and never make any new nukes. The power to do either of these things seems tantamount to the power to take over the world.
why most perfect algorithms that recreate a strawberry on the molecular level destroy the planet as well.
Phrased like this, the answer that comes to mind is "Well, this requires at least a few decades' worth of advances in materials science and nanotechnology and such, plus a lot of expensive equipment that doesn't exist today, and e.g. if you want this to happen with high probability, you need to be sure that civilization isn't wrecked by nuclear war or other threats in upcoming decades, so if you come up with a way of taking over the world that has higher certainty than leaving humanity to its own devices, then that becomes the best plan." Classic instrumental convergence, in other words.
The political version of the question isn't functionally the same as the skin cream version, because the former isn't a randomized intervention—cities that decided to add gun control laws seem likely to have other crime-related events and law changes at the same time, which could produce a spurious result in either direction. So it's quite reasonable to say "My opinion is determined by my priors and the evidence didn't appreciably affect my position."
90% awful idea: "Genetic diversity" in computer programs for resistance to large-scale cyberattacks.
The problem: Once someone has figured out the right security hole in Tesla's software (and, say, broken into a server used to deliver software updates), they can use this to install their malicious code into all 5 million Teslas in the field (or maybe just one model, so perhaps 1 million cars), and probably make them all crash simultaneously and cause a catastrophe.
The solution: There will probably come a point where we can go through the codebase and pick random functions and say, "Claude, write a specification of what this function does", and then "Claude, take this specification and write a new function implementing it", and end up with different functions that accomplish the same task, which are likely to have different bugs. Have every Tesla do this to its own software. Then the virus or program that breaks into some Teslas will likely fail on others.
One reason this is horrible is that you would need an exceptionally high success rate for writing those replacement functions—else this process would introduce lots of mundane bugs, which might well cause crashes of their own. That, or you'd need a very extensive set of unit tests to catch all such bugs—so extensive as to probably eat up most of your engineers' time writing them. Though perhaps AIs could do that part.
To me, that will lead to an environment where people think that they are engaging with criticism without having to really engage with the criticism that actually matters.
This is a possible outcome, especially if the above tactic were the only tactic to be employed. That tactic helps reduce ignorance of the "other side" on the issues that get the steelmanning discussion, and hopefully also pushes away low-curiosity tribalistic partisans while retaining members who value deepening understanding and intellectual integrity. There are lots of different ways for things to go wrong, and any complete strategy probably needs to use lots of tactics. Perhaps the most important tactic would be to notice when things are going wrong (ideally early) and adjust what you're doing, possibly designing new tactics in the process.
Also, in judging a strategy, we should know what resources we assume we have (e.g. "the meetup leader is following the practice we've specified and is willing to follow 'reasonable' requests or suggestions from us"), and know what threats we're modeling. In principle, we might sort the dangers by [impact if it happens] x [probability of it happening], enumerate tactics to handle the top several, do some cost-benefit analysis, decide on some practices, and repeat.
If you frame the criticism as having to be about the mission of psychiatry, it's easy for people to see "Is it ethical to charge poor patients three-digit fees for no-shows?" as off-topic.
My understanding/guess is that "Is it ethical to charge poor patients three-digit fees for no-shows?" is an issue where the psychiatrists know the options and the impacts of the options, and the "likelihood of people actually coming to blows" comes from social signaling things like "If I say I don't charge them, this shows I'm in a comfortable financial position and that I'm compassionate for poor patients"/"If I say I do charge them, this opens me up to accusations (tinged with social justice advocacy) of heartlessness and greed". I would guess that many psychiatrists do charge the fees, but would hate being forced to admit it in public. Anyway, the problem here is not that psychiatrists are unaware of information on the issue, so there'd be little point in doing a steelmanning exercise about it.
That said, as you suggest, it is possible that people would spend their time steelmanning unimportant issues (and making 'criticism' of the "We need fifty Stalins" type). But if we assume that we have one person who notices there's an important unaddressed issue, who has at least decent rapport with the meetup leader, then it seems they could ask for that issue to get steelmanned soon. That could cover it. (If we try to address the scenario where no one notices the unaddressed issue, that's a pretty different problem.)
I want to register high appreciation of Elizabeth for her efforts and intentions described here. <3
The remainder of this post is speculations about solutions. "If one were to try to fix the problem", or perhaps "If one were to try to preempt this problem in a fresh community". I'm agnostic about whether one should try.
Notes on the general problem:
Reading the transcript, my brain generated the idea of having a norm that pushes people to do exercises of the form "Keep your emotions in check as you enumerate the reasons against your favored position, or poke holes in naive arguments for your favored position" (and possibly alternate with arguing for your side, just for balance). In this case, it would be "If you're advocating that everyone do a thing always, then enumerate exceptions to it".
Fleshing it out a bit more... If a group has an explicit mission, then it seems like one could periodically have a session where everyone "steelmans" the case against the mission. People sit in a circle, raising their hands (or just speaking up) and volunteering counterarguments, as one person types them down into a document being projected onto a big screen. If someone makes a mockery of a counterargument ("We shouldn't do this because we enjoy torturing the innocent/are really dumb/subscribe to logical fallacy Y"), then other people gain status by correcting them ("Actually, those who say X more realistically justify it by ..."): this demonstrates their intelligence, knowledge, and moral and epistemic strength. Same thing when someone submits a good counterargument: they gain status ("Ooh, that's a good one") because it demonstrates those same qualities.
Do this for at least five minutes. After that, pause, and then let people formulate the argument for the mission and attack the counterarguments.
Issues in transcript labeling (I'm curious how much of it was done by machine):
Grammatically, the most obvious interpretation is a universal quantification (i.e. "All men are taller than all women"), which I think is a major reason why such statements so often lead to objections of "But here's an exception!" Maybe you can tell the audience that they should figure out when to mentally insert "... on average" or "tend to be". Though there are also circumstances where one might validly believe that the speaker really means all. I think it's best to put such qualified language into your statements from the start.
Are you not familiar with the term "vacuously true"? I find this very surprising. People who study math tend to make jokes with it.
The idea is that, if we were to render a statement like "Colorless green ideas sleep furiously" into formal logic, we'd probably take it to mean the universal statement "For all X such that X is a colorless green idea, X sleeps furiously". A universal statement is logically equivalent to "There don't exist any counterexamples", i.e. "There does not exist X such that X is a colorless green idea and X does not sleep furiously". Which is clearly true, and therefore the universal is equally true.
There is, of course, some ambiguity when rendering English into formal logic. It's not rare for English speakers to say "if" when they mean "if and only if", or "or" when they mean "exclusive or". (And sometimes "Tell me which one", as in "Did you do A, or B?" "Yes." "Goddammit.") Often this doesn't cause problems, but sometimes it does. (In which case, as I've said, the solution is not to give their statement an ambiguous truth value, but rather to ask them to restate it less ambiguously.)
"Dragons are attacking Paris" seems most naturally interpreted as the definite statement "There's some unspecified number—but since I used the plural, it's at least 2—of dragons that are attacking Paris", which would be false. One could also imagine interpreting it as a universal statement "All dragons are currently attacking Paris", which, as you say, would be vacuously true since there are no dragons. However, in English, the preferred way to say that would be "Dragons attack Paris", as CBiddulph says. "Dragons are attacking Paris" uses the present progressive tense, while "Dragons attack Paris" uses what is called the "simple present"/"present indefinite" tense. Wiki says:
The simple present is used to refer to an action or event that takes place habitually, to remark habits, facts and general realities, repeated actions or unchanging situations, emotions, and wishes.[3] Such uses are often accompanied by frequency adverbs and adverbial phrases such as always, sometimes, often, usually, from time to time, rarely, and never.
Examples:
- I always take a shower.
- I never go to the cinema.
- I walk to the pool.
- He writes for a living.
- She understands English.
This contrasts with the present progressive (present continuous), which is used to refer to something taking place at the present moment: I am walking now; He is writing a letter at the moment.
English grammar rules aren't necessarily universal and unchanging, but they do give at least medium-strength priors on how to interpret a sentence.
Interesting. The natural approach is to imagine that you just have a 3-sided die with 2, 4, 6 on the sides, and if you do that, then I compute A = 12 and B = 6[1]. But, as the top Reddit comment's edit points out, the difference between that problem and the one you posed is that your version heavily weights the probability towards short sequences—that weighting being 1/2^n for a sequence of length n. (Note that the numbers I got, A=12 and B=6, are so much higher than the A≈2.7 and B=3 you get.) It's an interesting selection effect.
The thing is that, if you roll a 6 and then a non-6, in an "A" sequence you're likely to just die due to rolling an odd number before you succeed in getting the double 6, and thus exclude the sequence from the surviving set; whereas in a "B" sequence there's a much higher chance you'll roll a 6 before dying, and thus include this longer "sequence of 3+ rolls" in the set.
To illustrate with an extreme version, consider:
Obviously that's one way to reduce A to 2.
Excluding odd rolls completely, so the die has a 1/3 chance of rolling 6 and a 2/3 chance of rolling an even number that's not 6, we have:
A = 1 + 1/3 * A2 + 2/3 * A
Where A2 represents "the expected number of die rolls until you get two 6's in a row, given that the last roll was a 6". Subtraction and multiplication then yields:
A = 3 + A2
And if we consider rolling a die from the A2 state, we get:
A2 = 1 + 1/3 * 0 + 2/3 * A
= 1 + 2/3 * A
Substituting:
A = 3 + 1 + 2/3 * A
=> (subtract)
1/3 * A = 4
=> (multiply)
A = 12
For B, a similar approach yields the equations:
B = 1 + 1/3 * B2 + 2/3 * B
B2 = 1 + 1/3 * 0 + 2/3 * B2
And the reader may solve for B = 6.