Best of LessWrong 2019

Power allows people to benefit from immoral acts without having to take responsibility or even be aware of them. The most powerful person in a situation may not be the most morally culpable, as they can remain distant from the actual "crime". If you're not actively looking into how your wants are being met, you may be unknowingly benefiting from something unethical.

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
trevor174
0
An example of the kind of high-level psychological manipulation you see from major newspapers: This single paragraph from the NYT's front page response to Trump's victory may have intended, in this order: 1. Depicting Biden and Harris as failed opposition to polarization, rather than active contributors to it. 2. Reminder that they originally 'promised' to heal divisions, diverting attention from the fact that many at the time understood this promise was superficial. 3. Insinuating that further polarization might be inescapable, with institutions like the NYT powerless against it rather than leading it. 4. Directly stating that Biden and Harris were weakened by failure to understand and "channel" polarization. 5. Concluding with a true and helpful statement describing how Trump was empowered by polarization, leaving out how the news corporations' business model originally incentivised him to win the 2016 Republican primary via notoriety maximization.  I probably didn't understand that entire paragraph and it's dynamics as well as the author did, but when taken in context with the rest of the article, the author's audacity is pretty clear. Language is sufficiently flexible for such a density of spells to be loaded into a paragraph this short, and news corporations have been getting sophisticated at operating within the constraint of maintaining a veneer of neutrality. Polarization and news corporations is like climate change and fossil fuel companies; there's such an incredible difference in statements and behavior between institutions incentivized to understand a phenomenon (much of academia actually still does a great job researching polarization and treating it as something to mitigate) and institutions incentivized to maximize it and profit off the results.
Fun unimportant thought: As tunnel density increases, there's a phase shift in how warfare works (in the area with dense tunnels). Consider e.g. a city that has a network of tunnels/catacombs/etc. underneath. Attackers can advance on the surface, and/or they can advance under the ground. For tunnel networks of historically typical density, it's better to advance on the surface. Why? Because (a) the networks are sparse enough that the defenders can easily man every chokepoint underground, and (b) not being able to use airplanes or long-ranged weaponry underground seems to advantage the defender more than the attacker (e.g. can't use artillery to soften up defenders, can't use tanks, can't scout or bomb from the air). OTOH your attacking forces can get real close before they themselves can be shot at -- but this doesn't seem to be sufficient compensation. Well, as the density of the network increases, eventually factor (a) reverses. Imagine a network so dense that in a typical 1km stretch of frontline, there are 100 separate tunnels passing beneath, such that you'd need at least 100 defensive chokepoints or else your line would have an exploitable hole. Not enough? Imagine that it's 1000... The point is, at some point it becomes more difficult to defend underground than to defend on the surface. Beyond that point fighting would mostly happen underground and it would be... infantry combat, but in three dimensions? Lots of isolated squads of men exploring dark tunnel networks, occasionally clashing with each other, forming impromptu chokepoints and seeking to outflank? Factor (b) might reverse as well. Perhaps it's entangled with factor (a). In a superdense tunnel network, I could see it being the case that the advantage of the attacker (being able to advance unseen, being able to advance without being caught in fields of fire by machineguns and artillery, until they are literally in the same hallway as the enemy) outweigh the advantage of the defender (good cover e
One question I have about the Mechanistic Interpretability safety case sketch from Roger Grosse: How do you actually find features of high concern?  Features of High Concern are things like "intent to carry out a harmful plan". By definition, they very rarely activate on benign deployment outputs. This must be the case as "if the features [of high concern] activate more than a handful of times in deployment (internal or external), we discontinue the model and roll back to a weaker model." [M1.1] They propose to find these features [S1.1] by performing dictionary learning of some form, and then some combination of: * autointerp on the learned features – which currently requires examples of the feature activating. I guess you can also look at attribution of bad outcomes to features? * looking at what features activate "when the model is instructed or perturbed to be misaligned." Generally dictionary learning only learns features that actually occur on some inputs. So you need to do something outside of the normal deployment distribution. Some ideas (many alluded to in the document): * Learn the features on the base model (or a ‘agentic-only’ version of the model) and then hope they transfer to the safety-trained model * Learn the features by gathering the activations on off-policy transcripts of harmful behavior. This is what Fabien Rodger does in coup-probes * Prompting the model to behave badly. This probably can’t be sufficient for features of high concern – you don’t want your model to take catastrophic actions even when instructed to do so! Maybe if you catch jailbreaks with a separate mechanism that you disable?  * Perturbing the model to behave badly. For example: * Something in the genre of Mack and Turner's mechanistically eliciting latent behaviors – e.g. this works to discover sleeper agents. * Maybe steering for related features? It’s not clear to me how you do dictionary learning on a steered model and then apply those features to the uns
Mindmeld In theory AIs can transmit information far faster and more directly than humans. They can directly send weight/activation vectors to one another. The most important variable on whether entities (cells, organisms, polities, companies, ideologies, empire etc) stay individuals or amalgate into a superorganism is communication bandwith & copy fidelity.  Both of these differ many order of magnitude for humans versus AIs. At some point, mere communication becomes a literal melding of minds. It seems quite plausibly then that AIs will tend to mindmeld if left alone.    Current state of the art fibre optic cables can transmit up to 10 terabits a second.  That's probably a wild overestimate for AI communication though. More relevant bottlenecks are limits on processing informations [plausibly more in the megabits range], limits on transferability of activation vectors (but training could improve this). 
keltan10
1
I recently saw a tweet that asked GPT, what micro skills a person should master. I asked Claude a similar question, though had preloaded the conversation with “things a modern day Aristotle would teach a 24yo student”. I think this is probably a valuable exercise to undertake yourself. I’m providing my prompt, and Claude’s response as an ‘appetiser’ to get you interested in this task. — — 1. Cold Reading & Behavioral Analysis * Understanding micro-expressions * Picking up on subtle vocal tone changes * Recognizing behavioral patterns in groups * Reading body language clusters rather than isolated gestures * This isn't about manipulation, but rather developing deeper empathy and understanding 1. Strategic Silence & Information Management * Knowing when not to speak * Understanding information asymmetry * The power of selective disclosure * Using thoughtful pauses in conversation * The art of asking rather than telling 1. Scenario Planning & Secondary Consequence Analysis * Ability to see 3-4 moves ahead in decisions * Understanding cascade effects * Identifying hidden risks and opportunities * Planning for both success and failure scenarios * Recognition of system dynamics 1. Social Capital Accounting * Understanding relationship networks * Tracking favors and obligations implicitly * Building genuine reciprocity * Recognition of social momentum * Understanding status games without playing them 1. Energy Management & Recovery Tactics * Understanding personal energy patterns * Quick recovery techniques between intense tasks * Environmental design for optimal performance * Micro-rest optimization * State switching abilities 1. Pattern Recognition Across Domains * Transferring insights between unrelated fields * Spotting emerging trends early * Recognition of historical rhymes * Understanding cyclic behaviors * Identifying underlying principles 1. Meta-Learning Optimization * Learning how you specifically learn best

Popular Comments

Recent Discussion

In our jobs as AI safety researchers, we think a lot about what it means to have reasonable beliefs and to make good decisions. This matters because we want to understand how powerful AI systems might behave. It also matters because we ourselves need to know how to make good decisions in light of tremendous uncertainty about how to shape the long-term future.

It seems to us that there is a pervasive feeling in this community that the way to decide which norms of rationality to follow is to pick the ones that win. When it comes to the choice between CDT vs. EDT vs. LDT…, we hear we can simply choose the one that gets the most utility. When we say that perhaps we ought to be...

Without a clear definition of "winning,"

This is part of the problem we're pointing out in the post. We've encountered claims of this "winning" flavor that haven't been made precise, so we survey different things "winning" could mean more precisely, and argue that they're inadequate for figuring out which norms of rationality to adopt.

2Raemon
Thanks, this gave me the context I needed.
3JesseClifton
The text says “A widely-used strategy for arguing for norms of rationality involves avoiding dominated strategies”, which is true* and something we thought would be familiar to everyone who is interested in these topics. For example, see the discussion of Dutch book arguments in the SEP entry on Bayesianism and all of the LessWrong discussion on money pump/dominance/sure loss arguments (e.g., see all of the references in and comments on this post). But fair enough, it would have been better to include citations. We did include (potential) examples in this case. Also, similarly to the above, I would think that encountering claims like “we ought to use some heuristic because it has worked well in the past” is commonplace among readers so didn’t see the need to provide extensive evidence. *Granted, we are using “dominated strategy” in the wide sense of “strategy that you are certain is worse than something else”, which glosses over technical points like the distinction between dominated strategy and sure loss.
9Anthony DiGiovanni
The key claim is: You can’t evaluate which beliefs and decision theory to endorse just by asking “which ones perform the best?” Because the whole question is what it means to systematically perform better, under uncertainty. Every operationalization of “systematically performing better” we’re aware of is either: * Incomplete — like “avoiding dominated strategies”, which leaves a lot unconstrained; * A poorly motivated proxy for the performance we actually care about — like “doing what’s worked in the past”; or * Secretly smuggling in nontrivial non-pragmatic assumptions — like “doing what’s worked in the past, not because that’s what we actually care about, but because past performance predicts future performance” This is what we meant to convey with this sentence: “On any way of making sense of those words, we end up either calling a very wide range of beliefs and decisions “rational”, or reifying an objective that has nothing to do with our terminal goals without some substantive assumptions.” (I can't tell from your comment if you agree with all of that. But, if this was all obvious to you, great! But we’ve often had discussions where someone appealed to “which ones perform the best?” in a way that misses these points.)

Epistemic Status: I believe I am well-versed in this subject. I erred on the side of making claims that were too strong and allowing readers to disagree and start a discussion about precise points rather than trying to edge-case every statement. I also think that using memes is important because safety ideas are boring and anti-memetic. So let’s go!

Many thanks to @scasper, @Sid Black , @Neel Nanda , @Fabien Roger , @Bogdan Ionut Cirstea, @WCargo, @Alexandre Variengien, @Jonathan Claybrough, @Edoardo Pona, @Andrea_Miotti, Diego Dorn, Angélina Gentaz, Clement Dumas, and Enzo Marsot for useful feedback and discussions.

When I started this post, I began by critiquing the article A Long List of Theories of Impact for Interpretability, from Neel Nanda, but I later expanded the scope of my critique. Some ideas...

1Andrew McKnight
Do you think putting extra effort into learning about existing empirical work while doing conceptual work would be sufficient for good conceptual work or do you think people need to be producing empirical work themselves to really make progress conceptually?

The former can be sufficient—e.g. there are good theoretical researchers who have never done empirical work themselves.

In hindsight I think "close conjunction" was too strong—it's more about picking up the ontologies and key insights from empirical work, which can be possible without following it very closely.

In the USA, the president isn't determined by a straight vote. Instead, each state gets a certain number of Electoral College (EC) votes, and the candidate with 270 EC votes wins.

It's up to each state to decide how to allocate its EC votes. Most do “winner-takes-all,” but some, e.g., Maine and Nebraska, split them up.

California and Texas have the most EC votes of any state, with 54 and 40 votes respectively, so you would think they would get a lot of love from presidential candidates. Instead, they're mostly ignored—California will always be Blue, and Texas Red, so what's the point of pandering to them? This is clearly bad for Californians and Texans as their interests aren't listened to.

So why doesn't California switch to a proportional EC vote...

I think it would be better to form a big winner-take-all bloc. With proportional voting, the number of electoral votes at stake will be only a small fraction of the total, so the per-voter influence of CA and TX would probably remain below the national average.

2Measure
I remember there was a movement a while back to have states agree to award their electors to the national proportional vote winner, but I'm not sure what came of that.
13Charlie Steiner
So a proportional vote interstate compact? :) I like it - I think one could specify an automatic method for striking a fair bargain between states (and only include states that use that method in the bargain). Then you could have states join the compact asynchronously. E.g. if the goal is to have the pre-campaign expected electors be the same, and Texas went 18/40 Biden in 2020 while California went 20/54 Trump in 2020, maybe in 2024 Texas assigns all its electors proportionally, while California assigns 49 electors proportionally and the remaining 5 by majority. That would cause the numbers to work out the same (plus or minus a rounding error). Suppose Connecticut also wants to join the compact, but it's also a blue state. I think the obvious thing to do is to distribute the expected minority electors proportional to total elector count - if Connecticut has 7 electors, it's responsible for balancing 7/61 of the 18 minority electors that are being traded, or just about exactly 2 of them. But the rounding is sometimes awkward - if we lived in a universe where Connecticut had 9 electors instead, it would be responsible for just about exactly 2.5 minority electors, which is super awkward especially if a lot of small states join and start accumulating rounding errors. What you could do instead is specify a loss function: you take the variance of the proportion of electors assigned proportionally among the states that are on the 'majority' side of the deal, multiply that by a constant (probably something small like 0.05, but obviously you do some simulations and pick something more informed), add the squared rounding error of expected minority electors, and that's your measure for how imperfect the assignment of proportional electors to states is. Then you just pick the assignment that's least imperfect. Add in some automated escape hatches in case of change of major parties, change of voting system, or being superseded by a more ambitious interstate compact, and b
1Shankar Sivarajan
I don't see any reason to structure this agreement as an open-ended compact other states can join instead of a bilateral agreement between just California and Texas as proposed. (The same reasoning applied to the National Popular Vote Interstate Compact would have its membership closed as soon as they reach a majority in electoral votes, and then completely disregard the votes of any state that didn't sign on, voting in whoever gets the most votes in member states.)

In this post, we’re going to use the diagrammatic notation of Bayes nets. However, we use the diagrams a little bit differently than is typical. In practice, such diagrams are usually used to define a distribution - e.g. the stock example diagram

The Slippery Sidewalk Model

... in combination with the five distributions , defines a joint distribution

In this post, we instead take the joint distribution as given, and use the diagrams to concisely state properties of the distribution. For instance, we say that a distribution  “satisfies” the diagram

 

if-and-only-if . (And once we get to approximation, we’ll say that  approximately satisfies the diagram, to within , if-and-only-if .)

The usage we’re interested in looks like:

  • State that some random variables satisfy several different diagrams
  • Derive some new diagrams which they satisfy

In other words, we want to write proofs diagrammatically - i.e....

3Lorxus
Here's something that's kept bothering me on and off for the last few months: This graphical rule immediately breaks Markov equivalence. Specifically, two DAGs are Markov-equivalent only if they share an (undirected) skeleton. (Lemma 6.1 at the link.) If the major/only thing we care about here regarding latential Bayes nets is that our Grand Joint Distribution P[XG] factorize over (that is, satisfy) our DAG G (and all of the DAGs we can get from it by applying the rules here), then by Thm 6.2 in the link above, P is also globally/locally Markov wrt G. This holds even when P[XG]>0 is not guaranteed for some of the possible joint states in XG, unlike Hammersley-Clifford would require. That in turn means that (Def 6.5) there's some distributions P can be such that P[XG] factors over G, but not G′=G+anonloopyextraarrow (where G′ trivially has the same vertices as G does); specifically, because G,G′ don't (quite) share a skeleton, they can't be Markov-equivalent, and because they aren't Markov-equivalent, P no longer needs to be (locally/globally) Markov wrt G′ (and in fact there must exist some P which explicitly break this), and because of that, such P need not factor over G′. Which I claim we should not want here, because (as always) we care primarily about preserving which joint probability distributions factorize over/satisfy which DAGs, and of course we probably don't get to pick whether our P is one of the ones where that break in the chain of logic matters.

Proof that the quoted bookkeeping rule works, for the exact case:

  • The original DAG  asserts 
  • If  just adds an edge from  to , then  says 
  • The original DAG's assertion  also implies , and therefore implies 's assertion .

The approximate case then follows by the new-and-improved Bookkeeping Theorem.

Not sure where the disconnect/con... (read more)

As Americans know, the electoral college gives disproportionate influence to swing states, which means a vote in the extremely blue state of California was basically wasted in the 2024 election, as are votes in extremely red states like Texas, Oklahoma, and Louisiana. State legislatures have the Constitutional power to assign their state's electoral votes. So why don't the four states sign a compact to assign all their electoral votes in 2028 and future presidential elections to the winner of the aggregate popular vote in those four states? Would this even be legal?

The population of CA is 39.0M (54 electoral votes), and the population of the three red states is 38.6M (55 electoral votes). The combined bloc would control a massive 109 electoral votes, and would have gone...

When I was growing up most families in our neighborhood had the daily paper on their lawn. As getting your news on the internet became a better option for most folks, though, delivery became less efficient: fewer houses on the same route. Prices went up, more people cancelled, and decline continued. I wonder if we might see something similar with the power grid?

Solar panels keep getting cheaper. When we installed panels in 2019 they only made sense because of the combination of net metering and the relatively generous SREC II incentives. By the time we installed our second set of panels a few months ago net metering alone was enough to make it worth it.

Now, as I've said a few times, net metering is kind of absurd. The way it works here is that...

You say solar is getting cheaper, but it is only the panels that are getting cheaper. They will continue to get even cheaper, but this is not relevant to retrofitting individual houses, where the cost is already dominated by labor. As the cost of labor dominates, economies of scale in labor will be more relevant.

2Douglas_Knight
To a first approximation, solar is legal for individual residences and illegal on a larger scale.
2jefftk
First, don't we know that? It's a public company and it has to report what it spends. But more importantly, I do generally think getting a regulated monopoly like this to become more efficient is intractable, at least in the short to medium term.
2Douglas_Knight
Maybe you could learn something by looking at the public filings, but you didn't look at them. By regulation, not by being public, it has to spend proportionate to its income, but whether it is spending on transmission or generation is a fiction dictated by the regulator. It may well be that its transmission operating costs are much lower than its price and that a change of prices would be viable without any improvement in efficiency. This is exactly what I would how I would expect the company to set prices if it controlled the regulator: to extract as much money as possible on transmission to minimize competition. I don't know how corrupt the regulator is, but that ignorance is exactly my point.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Foresight Institute's AI Safety Grants Program added a new focus area in response to the continually evolving field. Moving forward, our funding ($1.5M-$2M annually) will be allocated across the following four focus areas:

 

  1. Automating AI-relevant research and forecasting
  • Scaling AI-enabled research to support safe AGI development
  • Scaling efficient forecasting methods relevant for safe AGI
  • Other approaches in this area

 

2. Neurotech to integrate with or compete against AGI

  • Brain Computer Interfaces (BCI) to enhance human cognition or facilitate human-AGI collaboration
  • Whole Brain Emulations (WBE) which might function as human-like general intelligences that are more interpretable and alignable than AGI
  • Lo-fi emulations using behavioral and neural data with deep learning, potentially offering a cost-effective alternative to full WBEs
  • Other approaches in this area

 

3. Security technologies for securing AI systems

  • Implementations of computer security techniques (including POLA, SeL4-inspired systems,
...

Hi there.

Quick question. I am using a few articles from LessWrong for a dissertation. Are there any mainstream articles/sources that reference LessWrong as being the catalyst/partial source for AI alignment, researchers, and other academic literature? I think it's snobbish, or, discriminatory to regard LessWrong as merely another online website. I was hoping to get some advice on how to formulate a paragraph justifying the citation of LessWrong?

Thanks.

1Shankar Sivarajan
This is a little like the case of the Haruhi Problem, where a significant advance regarding the number of superpermutations was made by an anonymous poster on 4chan. In that case, the ephemerality of the post was a reasonable concern, and the solution was someone reposted the proof on ArXiv OEIS (with "Anonymous 4chan Poster" as the first author), and then cited that. Here, you have a fixed url, so you could just follow the established conventions for citing webpages. I don't think you need any special justification for it, nor do you need to treat this as anything other than "merely another online website" (you don't think it's "snobbish or discriminatory" to pretend it's something more because you count yourself among its users?). 
9Answer by AnthonyC
I'm curious which kinds of posts you're looking to cite, for what kinds of use in a dissertation for what field.  Looking over the site as a whole, different posts should (IMO) be regarded as akin to primary sources, news sources, non-peer-reviewed academic papers, whitepapers, symposia, textbook chapters, or professional sources, depending on the author and epidemic status. In other words, this isn't "a site" for this purpose, it's a forum that hosts many kinds of content in a conveniently cross-referenceable format, some but not all of which is of a suitable standard for referencing for some but not all academic uses. This at least should be familiar to how your professors think about other kinds of citations. Someone might cite a doctor's published case study as part of the background research on some disease, or the NYT's publication of a summary of the pentagon papers in regards to the history of first amendment jurisprudence, or a corporate whitepaper or other report as a source of data about an industry.
4PhilosophicalSoul
AI alignment mostly; seeking to bridge the gap between AI and law. Since LW has unique takes, and often serves as the origin point for ideas on alignment (even if they aren't cited by mainstream authors). Whether this site's purpose is to be cited or not is debatable. On a pragmatic level though, there's simply discussions here that can't be found anywhere else.

Depending on the posts I think you could argue they're comparable to one of thosebother source types I listed.

1keltan
I recently saw a tweet that asked GPT, what micro skills a person should master. I asked Claude a similar question, though had preloaded the conversation with “things a modern day Aristotle would teach a 24yo student”. I think this is probably a valuable exercise to undertake yourself. I’m providing my prompt, and Claude’s response as an ‘appetiser’ to get you interested in this task. — — 1. Cold Reading & Behavioral Analysis * Understanding micro-expressions * Picking up on subtle vocal tone changes * Recognizing behavioral patterns in groups * Reading body language clusters rather than isolated gestures * This isn't about manipulation, but rather developing deeper empathy and understanding 1. Strategic Silence & Information Management * Knowing when not to speak * Understanding information asymmetry * The power of selective disclosure * Using thoughtful pauses in conversation * The art of asking rather than telling 1. Scenario Planning & Secondary Consequence Analysis * Ability to see 3-4 moves ahead in decisions * Understanding cascade effects * Identifying hidden risks and opportunities * Planning for both success and failure scenarios * Recognition of system dynamics 1. Social Capital Accounting * Understanding relationship networks * Tracking favors and obligations implicitly * Building genuine reciprocity * Recognition of social momentum * Understanding status games without playing them 1. Energy Management & Recovery Tactics * Understanding personal energy patterns * Quick recovery techniques between intense tasks * Environmental design for optimal performance * Micro-rest optimization * State switching abilities 1. Pattern Recognition Across Domains * Transferring insights between unrelated fields * Spotting emerging trends early * Recognition of historical rhymes * Understanding cyclic behaviors * Identifying underlying principles 1. Meta-Learning Optimization * Learning how you specifically learn best
keltan10

I asked for further details on the 10th point and Claude listed a bunch of stuff I’ve absolutely never heard of. I’d say it’s probably related to meditation if I had to guess. Here’s that.

Claude Let me break down Time Perception Management into its deeper components, as this is one of the most subtle yet powerful micro-skills...

  1. Understanding Psychological Time vs Clock Time
  • How different mental states alter time perception
    • Flow states can make hours feel like minutes
    • Anxiety can make minutes feel like hours
  • Learning to intentionally shift betwee
... (read more)