Best of LessWrong 2019

Power allows people to benefit from immoral acts without having to take responsibility or even be aware of them. The most powerful person in a situation may not be the most morally culpable, as they can remain distant from the actual "crime". If you're not actively looking into how your wants are being met, you may be unknowingly benefiting from something unethical.

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
trevor174
0
An example of the kind of high-level psychological manipulation you see from major newspapers: This single paragraph from the NYT's front page response to Trump's victory may have intended, in this order: 1. Depicting Biden and Harris as failed opposition to polarization, rather than active contributors to it. 2. Reminder that they originally 'promised' to heal divisions, diverting attention from the fact that many at the time understood this promise was superficial. 3. Insinuating that further polarization might be inescapable, with institutions like the NYT powerless against it rather than leading it. 4. Directly stating that Biden and Harris were weakened by failure to understand and "channel" polarization. 5. Concluding with a true and helpful statement describing how Trump was empowered by polarization, leaving out how the news corporations' business model originally incentivised him to win the 2016 Republican primary via notoriety maximization.  I probably didn't understand that entire paragraph and it's dynamics as well as the author did, but when taken in context with the rest of the article, the author's audacity is pretty clear. Language is sufficiently flexible for such a density of spells to be loaded into a paragraph this short, and news corporations have been getting sophisticated at operating within the constraint of maintaining a veneer of neutrality. Polarization and news corporations is like climate change and fossil fuel companies; there's such an incredible difference in statements and behavior between institutions incentivized to understand a phenomenon (much of academia actually still does a great job researching polarization and treating it as something to mitigate) and institutions incentivized to maximize it and profit off the results.
Fun unimportant thought: As tunnel density increases, there's a phase shift in how warfare works (in the area with dense tunnels). Consider e.g. a city that has a network of tunnels/catacombs/etc. underneath. Attackers can advance on the surface, and/or they can advance under the ground. For tunnel networks of historically typical density, it's better to advance on the surface. Why? Because (a) the networks are sparse enough that the defenders can easily man every chokepoint underground, and (b) not being able to use airplanes or long-ranged weaponry underground seems to advantage the defender more than the attacker (e.g. can't use artillery to soften up defenders, can't use tanks, can't scout or bomb from the air). OTOH your attacking forces can get real close before they themselves can be shot at -- but this doesn't seem to be sufficient compensation. Well, as the density of the network increases, eventually factor (a) reverses. Imagine a network so dense that in a typical 1km stretch of frontline, there are 100 separate tunnels passing beneath, such that you'd need at least 100 defensive chokepoints or else your line would have an exploitable hole. Not enough? Imagine that it's 1000... The point is, at some point it becomes more difficult to defend underground than to defend on the surface. Beyond that point fighting would mostly happen underground and it would be... infantry combat, but in three dimensions? Lots of isolated squads of men exploring dark tunnel networks, occasionally clashing with each other, forming impromptu chokepoints and seeking to outflank? Factor (b) might reverse as well. Perhaps it's entangled with factor (a). In a superdense tunnel network, I could see it being the case that the advantage of the attacker (being able to advance unseen, being able to advance without being caught in fields of fire by machineguns and artillery, until they are literally in the same hallway as the enemy) outweigh the advantage of the defender (good cover e
One question I have about the Mechanistic Interpretability safety case sketch from Roger Grosse: How do you actually find features of high concern?  Features of High Concern are things like "intent to carry out a harmful plan". By definition, they very rarely activate on benign deployment outputs. This must be the case as "if the features [of high concern] activate more than a handful of times in deployment (internal or external), we discontinue the model and roll back to a weaker model." [M1.1] They propose to find these features [S1.1] by performing dictionary learning of some form, and then some combination of: * autointerp on the learned features – which currently requires examples of the feature activating. I guess you can also look at attribution of bad outcomes to features? * looking at what features activate "when the model is instructed or perturbed to be misaligned." Generally dictionary learning only learns features that actually occur on some inputs. So you need to do something outside of the normal deployment distribution. Some ideas (many alluded to in the document): * Learn the features on the base model (or a ‘agentic-only’ version of the model) and then hope they transfer to the safety-trained model * Learn the features by gathering the activations on off-policy transcripts of harmful behavior. This is what Fabien Rodger does in coup-probes * Prompting the model to behave badly. This probably can’t be sufficient for features of high concern – you don’t want your model to take catastrophic actions even when instructed to do so! Maybe if you catch jailbreaks with a separate mechanism that you disable?  * Perturbing the model to behave badly. For example: * Something in the genre of Mack and Turner's mechanistically eliciting latent behaviors – e.g. this works to discover sleeper agents. * Maybe steering for related features? It’s not clear to me how you do dictionary learning on a steered model and then apply those features to the uns
Mindmeld In theory AIs can transmit information far faster and more directly than humans. They can directly send weight/activation vectors to one another. The most important variable on whether entities (cells, organisms, polities, companies, ideologies, empire etc) stay individuals or amalgate into a superorganism is communication bandwith & copy fidelity.  Both of these differ many order of magnitude for humans versus AIs. At some point, mere communication becomes a literal melding of minds. It seems quite plausibly then that AIs will tend to mindmeld if left alone.    Current state of the art fibre optic cables can transmit up to 10 terabits a second.  That's probably a wild overestimate for AI communication though. More relevant bottlenecks are limits on processing informations [plausibly more in the megabits range], limits on transferability of activation vectors (but training could improve this). 
keltan10
1
I recently saw a tweet that asked GPT, what micro skills a person should master. I asked Claude a similar question, though had preloaded the conversation with “things a modern day Aristotle would teach a 24yo student”. I think this is probably a valuable exercise to undertake yourself. I’m providing my prompt, and Claude’s response as an ‘appetiser’ to get you interested in this task. — — 1. Cold Reading & Behavioral Analysis * Understanding micro-expressions * Picking up on subtle vocal tone changes * Recognizing behavioral patterns in groups * Reading body language clusters rather than isolated gestures * This isn't about manipulation, but rather developing deeper empathy and understanding 1. Strategic Silence & Information Management * Knowing when not to speak * Understanding information asymmetry * The power of selective disclosure * Using thoughtful pauses in conversation * The art of asking rather than telling 1. Scenario Planning & Secondary Consequence Analysis * Ability to see 3-4 moves ahead in decisions * Understanding cascade effects * Identifying hidden risks and opportunities * Planning for both success and failure scenarios * Recognition of system dynamics 1. Social Capital Accounting * Understanding relationship networks * Tracking favors and obligations implicitly * Building genuine reciprocity * Recognition of social momentum * Understanding status games without playing them 1. Energy Management & Recovery Tactics * Understanding personal energy patterns * Quick recovery techniques between intense tasks * Environmental design for optimal performance * Micro-rest optimization * State switching abilities 1. Pattern Recognition Across Domains * Transferring insights between unrelated fields * Spotting emerging trends early * Recognition of historical rhymes * Understanding cyclic behaviors * Identifying underlying principles 1. Meta-Learning Optimization * Learning how you specifically learn best

Popular Comments

Recent Discussion

Epistemic Status: I believe I am well-versed in this subject. I erred on the side of making claims that were too strong and allowing readers to disagree and start a discussion about precise points rather than trying to edge-case every statement. I also think that using memes is important because safety ideas are boring and anti-memetic. So let’s go!

Many thanks to @scasper, @Sid Black , @Neel Nanda , @Fabien Roger , @Bogdan Ionut Cirstea, @WCargo, @Alexandre Variengien, @Jonathan Claybrough, @Edoardo Pona, @Andrea_Miotti, Diego Dorn, Angélina Gentaz, Clement Dumas, and Enzo Marsot for useful feedback and discussions.

When I started this post, I began by critiquing the article A Long List of Theories of Impact for Interpretability, from Neel Nanda, but I later expanded the scope of my critique. Some ideas...

1Andrew McKnight
Do you think putting extra effort into learning about existing empirical work while doing conceptual work would be sufficient for good conceptual work or do you think people need to be producing empirical work themselves to really make progress conceptually?

The former can be sufficient—e.g. there are good theoretical researchers who have never done empirical work themselves.

In hindsight I think "close conjunction" was too strong—it's more about picking up the ontologies and key insights from empirical work, which can be possible without following it very closely.

In the USA, the president isn't determined by a straight vote. Instead, each state gets a certain number of Electoral College (EC) votes, and the candidate with 270 EC votes wins.

It's up to each state to decide how to allocate its EC votes. Most do “winner-takes-all,” but some, e.g., Maine and Nebraska, split them up.

California and Texas have the most EC votes of any state, with 54 and 40 votes respectively, so you would think they would get a lot of love from presidential candidates. Instead, they're mostly ignored—California will always be Blue, and Texas Red, so what's the point of pandering to them? This is clearly bad for Californians and Texans as their interests aren't listened to.

So why doesn't California switch to a proportional EC vote...

I think it would be better to form a big winner-take-all bloc. With proportional voting, the number of electoral votes at stake will be only a small fraction of the total, so the per-voter influence of CA and TX would probably remain below the national average.

2Measure
I remember there was a movement a while back to have states agree to award their electors to the national proportional vote winner, but I'm not sure what came of that.
13Charlie Steiner
So a proportional vote interstate compact? :) I like it - I think one could specify an automatic method for striking a fair bargain between states (and only include states that use that method in the bargain). Then you could have states join the compact asynchronously. E.g. if the goal is to have the pre-campaign expected electors be the same, and Texas went 18/40 Biden in 2020 while California went 20/54 Trump in 2020, maybe in 2024 Texas assigns all its electors proportionally, while California assigns 49 electors proportionally and the remaining 5 by majority. That would cause the numbers to work out the same (plus or minus a rounding error). Suppose Connecticut also wants to join the compact, but it's also a blue state. I think the obvious thing to do is to distribute the expected minority electors proportional to total elector count - if Connecticut has 7 electors, it's responsible for balancing 7/61 of the 18 minority electors that are being traded, or just about exactly 2 of them. But the rounding is sometimes awkward - if we lived in a universe where Connecticut had 9 electors instead, it would be responsible for just about exactly 2.5 minority electors, which is super awkward especially if a lot of small states join and start accumulating rounding errors. What you could do instead is specify a loss function: you take the variance of the proportion of electors assigned proportionally among the states that are on the 'majority' side of the deal, multiply that by a constant (probably something small like 0.05, but obviously you do some simulations and pick something more informed), add the squared rounding error of expected minority electors, and that's your measure for how imperfect the assignment of proportional electors to states is. Then you just pick the assignment that's least imperfect. Add in some automated escape hatches in case of change of major parties, change of voting system, or being superseded by a more ambitious interstate compact, and b
1Shankar Sivarajan
I don't see any reason to structure this agreement as an open-ended compact other states can join instead of a bilateral agreement between just California and Texas as proposed. (The same reasoning applied to the National Popular Vote Interstate Compact would have its membership closed as soon as they reach a majority in electoral votes, and then completely disregard the votes of any state that didn't sign on, voting in whoever gets the most votes in member states.)

In this post, we’re going to use the diagrammatic notation of Bayes nets. However, we use the diagrams a little bit differently than is typical. In practice, such diagrams are usually used to define a distribution - e.g. the stock example diagram

The Slippery Sidewalk Model

... in combination with the five distributions , defines a joint distribution

In this post, we instead take the joint distribution as given, and use the diagrams to concisely state properties of the distribution. For instance, we say that a distribution  “satisfies” the diagram

 

if-and-only-if . (And once we get to approximation, we’ll say that  approximately satisfies the diagram, to within , if-and-only-if .)

The usage we’re interested in looks like:

  • State that some random variables satisfy several different diagrams
  • Derive some new diagrams which they satisfy

In other words, we want to write proofs diagrammatically - i.e....

3Lorxus
Here's something that's kept bothering me on and off for the last few months: This graphical rule immediately breaks Markov equivalence. Specifically, two DAGs are Markov-equivalent only if they share an (undirected) skeleton. (Lemma 6.1 at the link.) If the major/only thing we care about here regarding latential Bayes nets is that our Grand Joint Distribution P[XG] factorize over (that is, satisfy) our DAG G (and all of the DAGs we can get from it by applying the rules here), then by Thm 6.2 in the link above, P is also globally/locally Markov wrt G. This holds even when P[XG]>0 is not guaranteed for some of the possible joint states in XG, unlike Hammersley-Clifford would require. That in turn means that (Def 6.5) there's some distributions P can be such that P[XG] factors over G, but not G′=G+anonloopyextraarrow (where G′ trivially has the same vertices as G does); specifically, because G,G′ don't (quite) share a skeleton, they can't be Markov-equivalent, and because they aren't Markov-equivalent, P no longer needs to be (locally/globally) Markov wrt G′ (and in fact there must exist some P which explicitly break this), and because of that, such P need not factor over G′. Which I claim we should not want here, because (as always) we care primarily about preserving which joint probability distributions factorize over/satisfy which DAGs, and of course we probably don't get to pick whether our P is one of the ones where that break in the chain of logic matters.

Proof that the quoted bookkeeping rule works, for the exact case:

  • The original DAG  asserts 
  • If  just adds an edge from  to , then  says 
  • The original DAG's assertion  also implies , and therefore implies 's assertion .

The approximate case then follows by the new-and-improved Bookkeeping Theorem.

Not sure where the disconnect/con... (read more)

As Americans know, the electoral college gives disproportionate influence to swing states, which means a vote in the extremely blue state of California was basically wasted in the 2024 election, as are votes in extremely red states like Texas, Oklahoma, and Louisiana. State legislatures have the Constitutional power to assign their state's electoral votes. So why don't the four states sign a compact to assign all their electoral votes in 2028 and future presidential elections to the winner of the aggregate popular vote in those four states? Would this even be legal?

The population of CA is 39.0M (54 electoral votes), and the population of the three red states is 38.6M (55 electoral votes). The combined bloc would control a massive 109 electoral votes, and would have gone...

When I was growing up most families in our neighborhood had the daily paper on their lawn. As getting your news on the internet became a better option for most folks, though, delivery became less efficient: fewer houses on the same route. Prices went up, more people cancelled, and decline continued. I wonder if we might see something similar with the power grid?

Solar panels keep getting cheaper. When we installed panels in 2019 they only made sense because of the combination of net metering and the relatively generous SREC II incentives. By the time we installed our second set of panels a few months ago net metering alone was enough to make it worth it.

Now, as I've said a few times, net metering is kind of absurd. The way it works here is that...

You say solar is getting cheaper, but it is only the panels that are getting cheaper. They will continue to get even cheaper, but this is not relevant to retrofitting individual houses, where the cost is already dominated by labor. As the cost of labor dominates, economies of scale in labor will be more relevant.

2Douglas_Knight
To a first approximation, solar is legal for individual residences and illegal on a larger scale.
2jefftk
First, don't we know that? It's a public company and it has to report what it spends. But more importantly, I do generally think getting a regulated monopoly like this to become more efficient is intractable, at least in the short to medium term.
2Douglas_Knight
Maybe you could learn something by looking at the public filings, but you didn't look at them. By regulation, not by being public, it has to spend proportionate to its income, but whether it is spending on transmission or generation is a fiction dictated by the regulator. It may well be that its transmission operating costs are much lower than its price and that a change of prices would be viable without any improvement in efficiency. This is exactly what I would how I would expect the company to set prices if it controlled the regulator: to extract as much money as possible on transmission to minimize competition. I don't know how corrupt the regulator is, but that ignorance is exactly my point.

Foresight Institute's AI Safety Grants Program added a new focus area in response to the continually evolving field. Moving forward, our funding ($1.5M-$2M annually) will be allocated across the following four focus areas:

 

  1. Automating AI-relevant research and forecasting
  • Scaling AI-enabled research to support safe AGI development
  • Scaling efficient forecasting methods relevant for safe AGI
  • Other approaches in this area

 

2. Neurotech to integrate with or compete against AGI

  • Brain Computer Interfaces (BCI) to enhance human cognition or facilitate human-AGI collaboration
  • Whole Brain Emulations (WBE) which might function as human-like general intelligences that are more interpretable and alignable than AGI
  • Lo-fi emulations using behavioral and neural data with deep learning, potentially offering a cost-effective alternative to full WBEs
  • Other approaches in this area

 

3. Security technologies for securing AI systems

  • Implementations of computer security techniques (including POLA, SeL4-inspired systems,
...
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Hi there.

Quick question. I am using a few articles from LessWrong for a dissertation. Are there any mainstream articles/sources that reference LessWrong as being the catalyst/partial source for AI alignment, researchers, and other academic literature? I think it's snobbish, or, discriminatory to regard LessWrong as merely another online website. I was hoping to get some advice on how to formulate a paragraph justifying the citation of LessWrong?

Thanks.

1Shankar Sivarajan
This is a little like the case of the Haruhi Problem, where a significant advance regarding the number of superpermutations was made by an anonymous poster on 4chan. In that case, the ephemerality of the post was a reasonable concern, and the solution was someone reposted the proof on ArXiv OEIS (with "Anonymous 4chan Poster" as the first author), and then cited that. Here, you have a fixed url, so you could just follow the established conventions for citing webpages. I don't think you need any special justification for it, nor do you need to treat this as anything other than "merely another online website" (you don't think it's "snobbish or discriminatory" to pretend it's something more because you count yourself among its users?). 
9Answer by AnthonyC
I'm curious which kinds of posts you're looking to cite, for what kinds of use in a dissertation for what field.  Looking over the site as a whole, different posts should (IMO) be regarded as akin to primary sources, news sources, non-peer-reviewed academic papers, whitepapers, symposia, textbook chapters, or professional sources, depending on the author and epidemic status. In other words, this isn't "a site" for this purpose, it's a forum that hosts many kinds of content in a conveniently cross-referenceable format, some but not all of which is of a suitable standard for referencing for some but not all academic uses. This at least should be familiar to how your professors think about other kinds of citations. Someone might cite a doctor's published case study as part of the background research on some disease, or the NYT's publication of a summary of the pentagon papers in regards to the history of first amendment jurisprudence, or a corporate whitepaper or other report as a source of data about an industry.
4PhilosophicalSoul
AI alignment mostly; seeking to bridge the gap between AI and law. Since LW has unique takes, and often serves as the origin point for ideas on alignment (even if they aren't cited by mainstream authors). Whether this site's purpose is to be cited or not is debatable. On a pragmatic level though, there's simply discussions here that can't be found anywhere else.

Depending on the posts I think you could argue they're comparable to one of thosebother source types I listed.

1keltan
I recently saw a tweet that asked GPT, what micro skills a person should master. I asked Claude a similar question, though had preloaded the conversation with “things a modern day Aristotle would teach a 24yo student”. I think this is probably a valuable exercise to undertake yourself. I’m providing my prompt, and Claude’s response as an ‘appetiser’ to get you interested in this task. — — 1. Cold Reading & Behavioral Analysis * Understanding micro-expressions * Picking up on subtle vocal tone changes * Recognizing behavioral patterns in groups * Reading body language clusters rather than isolated gestures * This isn't about manipulation, but rather developing deeper empathy and understanding 1. Strategic Silence & Information Management * Knowing when not to speak * Understanding information asymmetry * The power of selective disclosure * Using thoughtful pauses in conversation * The art of asking rather than telling 1. Scenario Planning & Secondary Consequence Analysis * Ability to see 3-4 moves ahead in decisions * Understanding cascade effects * Identifying hidden risks and opportunities * Planning for both success and failure scenarios * Recognition of system dynamics 1. Social Capital Accounting * Understanding relationship networks * Tracking favors and obligations implicitly * Building genuine reciprocity * Recognition of social momentum * Understanding status games without playing them 1. Energy Management & Recovery Tactics * Understanding personal energy patterns * Quick recovery techniques between intense tasks * Environmental design for optimal performance * Micro-rest optimization * State switching abilities 1. Pattern Recognition Across Domains * Transferring insights between unrelated fields * Spotting emerging trends early * Recognition of historical rhymes * Understanding cyclic behaviors * Identifying underlying principles 1. Meta-Learning Optimization * Learning how you specifically learn best
keltan10

I asked for further details on the 10th point and Claude listed a bunch of stuff I’ve absolutely never heard of. I’d say it’s probably related to meditation if I had to guess. Here’s that.

Claude Let me break down Time Perception Management into its deeper components, as this is one of the most subtle yet powerful micro-skills...

  1. Understanding Psychological Time vs Clock Time
  • How different mental states alter time perception
    • Flow states can make hours feel like minutes
    • Anxiety can make minutes feel like hours
  • Learning to intentionally shift betwee
... (read more)

(This was inspired by a conversation with Alex Altair and other fellows as part of the agent foundations fellowship, funded by the LTFF)

(Also: after I had essentially finished this piece, I was pointed toward the post Natural abstractions are observer-dependent which covers a lot of similar ground. I've decided to post this one anyway because it comes at things from a slightly different angle.)

Here is a nice summary statement of the Natural Abstractions Hypothesis (NAH):

The Natural Abstraction Hypothesis, proposed by John Wentworth, states that there exist abstractions (relatively low-dimensional summaries which capture information relevant for prediction) which are "natural" in the sense that we should expect a wide variety of cognitive systems to converge on using them.

I think that this is not true and that whenever cognitive...

Your reaction seems fair, thanks for your thoughts! Its a good a suggestion to add an epistemic status - I'll be sure to add one next time I write something like this.

3Alfred Harwood
Got it, that makes sense. I think I was trying to get at something like this when I was talking about constraints/selection pressure (a system has less need to use abstractions if its compute is unconstrained or there is no selection pressure in the 'produce short/quick programs' direction) but your explanation makes this clearer. Thanks again for clearing this up!