localdeity

Wiki Contributions

Comments

Sorted by

I can also come up with a story where obviously it's cheaper and more effective to disable all of the nuclear weapons than it is to take over the world, so why would the AI do the second thing?

Erm... For preventing nuclear war on the scale of decades... I don't know what you have in mind for how it would disable all the nukes, but a one-off breaking of all the firing mechanisms isn't going to work.  They could just repair/replace that once they discovered the problem.  You could imagine some more drastic thing like blowing up the conventional explosives on the missiles so as to utterly ruin them, but in a way that doesn't trigger the big chain reaction.  But my impression is that, if you have a pile of weapons-grade uranium, then it's reasonably simple to make a bomb out of it, and since uranium is an element, no conventional explosion can eliminate that from the debris.  Maybe you can melt it, mix it with other stuff, and make it super-impure?

But even then, the U.S. and Russia probably have stockpiles of weapons-grade uranium.  I suspect they could make nukes out of that within a few months.  You would have to ruin all the stockpiles too.

And then there's the possibility of mining more uranium and enriching it; I feel like this would take a few years at most, possibly much less if one threw a bunch of resources into rushing it.  Would you ruin all uranium mines in the world somehow?

No, it seems to me that the only ways to reliably rule out nuclear war involve either using overwhelming physical force to prevent people from using or making nukes (like a drone army watching all the uranium stockpiles), or being able to reliably persuade the governments of all nuclear powers in the world to disarm and never make any new nukes.  The power to do either of these things seems tantamount to the power to take over the world.

why most perfect algorithms that recreate a strawberry on the molecular level destroy the planet as well.

Phrased like this, the answer that comes to mind is "Well, this requires at least a few decades' worth of advances in materials science and nanotechnology and such, plus a lot of expensive equipment that doesn't exist today, and e.g. if you want this to happen with high probability, you need to be sure that civilization isn't wrecked by nuclear war or other threats in upcoming decades, so if you come up with a way of taking over the world that has higher certainty than leaving humanity to its own devices, then that becomes the best plan."  Classic instrumental convergence, in other words.

The political version of the question isn't functionally the same as the skin cream version, because the former isn't a randomized intervention—cities that decided to add gun control laws seem likely to have other crime-related events and law changes at the same time, which could produce a spurious result in either direction.  So it's quite reasonable to say "My opinion is determined by my priors and the evidence didn't appreciably affect my position."

localdeity3110

90% awful idea: "Genetic diversity" in computer programs for resistance to large-scale cyberattacks.

The problem: Once someone has figured out the right security hole in Tesla's software (and, say, broken into a server used to deliver software updates), they can use this to install their malicious code into all 5 million Teslas in the field (or maybe just one model, so perhaps 1 million cars), and probably make them all crash simultaneously and cause a catastrophe.

The solution: There will probably come a point where we can go through the codebase and pick random functions and say, "Claude, write a specification of what this function does", and then "Claude, take this specification and write a new function implementing it", and end up with different functions that accomplish the same task, which are likely to have different bugs.  Have every Tesla do this to its own software.  Then the virus or program that breaks into some Teslas will likely fail on others.

One reason this is horrible is that you would need an exceptionally high success rate for writing those replacement functions—else this process would introduce lots of mundane bugs, which might well cause crashes of their own.  That, or you'd need a very extensive set of unit tests to catch all such bugs—so extensive as to probably eat up most of your engineers' time writing them.  Though perhaps AIs could do that part.

To me, that will lead to an environment where people think that they are engaging with criticism without having to really engage with the criticism that actually matters. 

This is a possible outcome, especially if the above tactic were the only tactic to be employed.  That tactic helps reduce ignorance of the "other side" on the issues that get the steelmanning discussion, and hopefully also pushes away low-curiosity tribalistic partisans while retaining members who value deepening understanding and intellectual integrity.  There are lots of different ways for things to go wrong, and any complete strategy probably needs to use lots of tactics.  Perhaps the most important tactic would be to notice when things are going wrong (ideally early) and adjust what you're doing, possibly designing new tactics in the process.

Also, in judging a strategy, we should know what resources we assume we have (e.g. "the meetup leader is following the practice we've specified and is willing to follow 'reasonable' requests or suggestions from us"), and know what threats we're modeling.  In principle, we might sort the dangers by [impact if it happens] x [probability of it happening], enumerate tactics to handle the top several, do some cost-benefit analysis, decide on some practices, and repeat.

If you frame the criticism as having to be about the mission of psychiatry, it's easy for people to see "Is it ethical to charge poor patients three-digit fees for no-shows?" as off-topic. 

My understanding/guess is that "Is it ethical to charge poor patients three-digit fees for no-shows?" is an issue where the psychiatrists know the options and the impacts of the options, and the "likelihood of people actually coming to blows" comes from social signaling things like "If I say I don't charge them, this shows I'm in a comfortable financial position and that I'm compassionate for poor patients"/"If I say I do charge them, this opens me up to accusations (tinged with social justice advocacy) of heartlessness and greed".  I would guess that many psychiatrists do charge the fees, but would hate being forced to admit it in public.  Anyway, the problem here is not that psychiatrists are unaware of information on the issue, so there'd be little point in doing a steelmanning exercise about it.

That said, as you suggest, it is possible that people would spend their time steelmanning unimportant issues (and making 'criticism' of the "We need fifty Stalins" type).  But if we assume that we have one person who notices there's an important unaddressed issue, who has at least decent rapport with the meetup leader, then it seems they could ask for that issue to get steelmanned soon.  That could cover it.  (If we try to address the scenario where no one notices the unaddressed issue, that's a pretty different problem.)

I want to register high appreciation of Elizabeth for her efforts and intentions described here. <3

The remainder of this post is speculations about solutions.  "If one were to try to fix the problem", or perhaps "If one were to try to preempt this problem in a fresh community".  I'm agnostic about whether one should try.

Notes on the general problem:

  • I suspect lots of our kind of people are not enthusiastic about kicking people out.  I think several people have commented, on some cases of seriously bad actors, that it took way too long to actually expel them.
  • Therefore, the idea of confronting someone like Jacy and saying "Your arguments are bad, and you seem to be discouraging critical thinking, so we demand you stop it or we'll kick you out" seems like a non-starter in a few ways.
  • I guess one could have lighter policing of the form "When you do somewhat-bad things like that, someone will criticize you for it."  Sort of like Elizabeth arguing against Jacy.  In theory, if one threw enough resources at this, one could create an environment where Jacy-types faced consistent mild pushback, which might work to get them to either reform or leave.  However, I think this would take a lot more of the required resources (time, emotional effort) than the right people are inclined to give.
    • Those who enjoy winning internet fights... Might be more likely to be Jacy-types in the first place.  The intersection of "happy to spend lots of time policing others' behavior" and "not having what seem like more important things to work on" and "embodies the principles we hope to uphold" might be pretty small.  The example that comes to mind is Reddit moderators, who have a reputation for being power-trippers.  If the position is unpaid, then it seems logical to expect that result.  So I conclude that, to a first approximation, good moderators must be paid.
    • Could LLMs help with this today?  (Obviously this would work specifically for online written stuff, not in-person.)  Identifying bad comments is one possibility; helping write the criticism is another.
  • Beyond that, one could have "passive" practices, things that everyone was in the habit of doing, which would tend to annoy the bad actors while being neutral (or, hopefully, positive) to the good actors.
    • (I've heard that the human immune system, in certain circumstances, does basically that: search for antibodies that (a) bind to the bad things and (b) don't bind to your own healthy cells.  Of course, one could say that this is obviously the only sensible thing to do.)

Reading the transcript, my brain generated the idea of having a norm that pushes people to do exercises of the form "Keep your emotions in check as you enumerate the reasons against your favored position, or poke holes in naive arguments for your favored position" (and possibly alternate with arguing for your side, just for balance).  In this case, it would be "If you're advocating that everyone do a thing always, then enumerate exceptions to it".

Fleshing it out a bit more... If a group has an explicit mission, then it seems like one could periodically have a session where everyone "steelmans" the case against the mission.  People sit in a circle, raising their hands (or just speaking up) and volunteering counterarguments, as one person types them down into a document being projected onto a big screen.  If someone makes a mockery of a counterargument ("We shouldn't do this because we enjoy torturing the innocent/are really dumb/subscribe to logical fallacy Y"), then other people gain status by correcting them ("Actually, those who say X more realistically justify it by ..."): this demonstrates their intelligence, knowledge, and moral and epistemic strength.  Same thing when someone submits a good counterargument: they gain status ("Ooh, that's a good one") because it demonstrates those same qualities.

Do this for at least five minutes.  After that, pause, and then let people formulate the argument for the mission and attack the counterarguments.

Issues in transcript labeling (I'm curious how much of it was done by machine):

  • After 00:07:55, a line is unattributed to either speaker; looks like it should be Timothy.
  • 00:09:43 is attributed to Timothy but I think must be Elizabeth.
  • Then the next line is unattributed (should be Timothy).
  • After 00:14:00, unattributed (should be Timothy).
  • After 00:23:38, unattributed (should be Timothy)
  • After 00:32:34, unattributed (probably Elizabeth)

Grammatically, the most obvious interpretation is a universal quantification (i.e. "All men are taller than all women"), which I think is a major reason why such statements so often lead to objections of "But here's an exception!"  Maybe you can tell the audience that they should figure out when to mentally insert "... on average" or "tend to be".  Though there are also circumstances where one might validly believe that the speaker really means all.  I think it's best to put such qualified language into your statements from the start.

Are you not familiar with the term "vacuously true"?  I find this very surprising.  People who study math tend to make jokes with it.

The idea is that, if we were to render a statement like "Colorless green ideas sleep furiously" into formal logic, we'd probably take it to mean the universal statement "For all X such that X is a colorless green idea, X sleeps furiously".  A universal statement is logically equivalent to "There don't exist any counterexamples", i.e. "There does not exist X such that X is a colorless green idea and X does not sleep furiously".  Which is clearly true, and therefore the universal is equally true.

There is, of course, some ambiguity when rendering English into formal logic.  It's not rare for English speakers to say "if" when they mean "if and only if", or "or" when they mean "exclusive or".  (And sometimes "Tell me which one", as in "Did you do A, or B?" "Yes." "Goddammit.")  Often this doesn't cause problems, but sometimes it does.  (In which case, as I've said, the solution is not to give their statement an ambiguous truth value, but rather to ask them to restate it less ambiguously.)

"Dragons are attacking Paris" seems most naturally interpreted as the definite statement "There's some unspecified number—but since I used the plural, it's at least 2—of dragons that are attacking Paris", which would be false.  One could also imagine interpreting it as a universal statement "All dragons are currently attacking Paris", which, as you say, would be vacuously true since there are no dragons.  However, in English, the preferred way to say that would be "Dragons attack Paris", as CBiddulph says.  "Dragons are attacking Paris" uses the present progressive tense, while "Dragons attack Paris" uses what is called the "simple present"/"present indefinite" tense.  Wiki says:

The simple present is used to refer to an action or event that takes place habitually, to remark habits, facts and general realities, repeated actions or unchanging situations, emotions, and wishes.[3] Such uses are often accompanied by frequency adverbs and adverbial phrases such as always, sometimes, often, usually, from time to time, rarely, and never.

Examples:

  • I always take a shower.
  • I never go to the cinema.
  • I walk to the pool.
  • He writes for a living.
  • She understands English.

This contrasts with the present progressive (present continuous), which is used to refer to something taking place at the present moment: I am walking now; He is writing a letter at the moment.

English grammar rules aren't necessarily universal and unchanging, but they do give at least medium-strength priors on how to interpret a sentence.

to the point where you can't really eliminate the context-dependence and vagueness via taboo (because the new words you use will still be somewhat context-dependent and vague)

You don't need to "eliminate" the vagueness, just reduce it enough that it isn't affecting any important decisions.  (And context-dependence isn't necessarily a problem if you establish the context with your interlocutor.)  I think this is generally achievable, and have cited the Eggplant essay on this.  And if it is generally achievable, then:

Richard is arguing against foundational pictures which assume these problems away, and in favor of foundational pictures which recognize them.

I think you should handle the problems separately.  In which case, when reasoning about truth, you should indeed assume away communication difficulties.  If our communication technology was so bad that 30% of our words got dropped from every message, the solution would not be to change our concept of meanings; the solution would be to get better at error correction, ideally at a lower level, but if necessary by repeating ourselves and asking for clarification a lot.

Elsewhere there's discussion of concepts themselves being ambiguous.  That is a deeper issue.  But I think it's fundamentally resolved in the same way: always be alert for the possibility that the concept you're using is the wrong one, is incoherent or inapplicable to the current situation; and when it is, take corrective action, and then proceed with reasoning about truth.  Be like a digital circuit, where at each stage your confidence in the applicability of a concept is either >90% or <10%, and if you encounter anything in between, then you pause and figure out a better concept, or find another path in which this ambiguity is irrelevant.

Load More