Wikitag Dashboard — LessWrong

Wikitags in Need of Work

AI Control in the context of AI Alignment is a category of plans that aim to ensure safety and benefit from AI systems, even if they are goal-directed and are actively trying to subvert your control measures. From The case for ensuring that powerful AIs are controlled:.. (read more)

Archetypal Transfer Learning (ATL) is a proposal by @whitehatStoic for what is argued by the author to be a fine tuning approach that "uses archetypal data" to "embed Synthetic Archetypes". These Synthetic Archetypes are derived from patterns that models assimilate from archetypal data, such as artificial stories. The method yielded a shutdown activation rate of 57.33% in the GPT-2-XL model after fine-tuning. .. (read more)

Language & Linguistics

Newest Wikitags

Wikitag Voting Activity

Recent Wikitag Activity

Wikitags in Need of Work

Newest Wikitag

Wikitag Voting Activity

Combined Wikitags Activity Feed

Wikitags in Need of Work

Reset Filter Collapse Wikitags

All Wikitags

Language & Linguistics

Newest Wikitags

Wikitag Voting Activity

Recent Wikitag Activity

A reasoning step is "logically valid" when that kind of step never produces a false conclusion from true premises. For example, in algebra, "Add 2 to both sides of the equation" is valid because it only produces true equations from true equations, while "Divide both sides by x" is invalid because x might be 0. So even if "2x = (y+1)x", letting x = 0 and y = 2, the original equation can be true while "2 = y + 1" is false. But "2x + 2 = (y+1)x + 2" will be true in every semantic model where the original equation is true.

More generally in life, there's a question of "did you execute each local step of reasoning correctly", which can be considered apart from "did you arrive at the correct conclusion". Validity is a local property of a reasoning step or sequence; we can (and should) evaluate each step's validity separately from whether we agree with the premises or end up agreeing with the conclusion. For near-logical domains, this asks "Does the next proposition follow (with very high probability, given other things usually believed about the world or explicitly introduced as premises) from the previous proposition?" For probabilistic reasoning, informal validity asks, "Given everything else believed or introduced as a premise, is this next step adjusting probabilities by the right amount?" or "Does this kind of reasoning step in general produce well-calibrated conclusions from well-calibrated premises?"

Eg, consider why the ad hominem fallacy should be seen as "invalid" or a "locally invalid reasoning step" from this viewpoint. Suppose you start out with well-calibrated probabilities (things you say "60%" for, happen around 60% of the time). You assign 60% probability that the sky is blue. Then somebody says, "Yeah, well, people who believe in blueskyism are ugly" and you nod and adjust your credence in blueskyism down to 40%. Your odds just went from 3:2 to 2:3, so by Bayes's Rule you should've heard evidence with a likelihood ratio of 4:9 to produce that probability shift. Unless you already believe that false propositions are 225% as likely as true propositions to be believed by ugly people, you should already expect that believing an ad hominem argument is something that can produce ill-calibrated conclusions in expectation from well-calibrated premises.

Main articles: