Quick Takes


The recent Gordon Seidoh Worley/Said Achmiz blowup and the subsequent threads (1, 2) it spawned, along my own involvement in them, got me thinking a bit about this site, on a more nostalgic/meta level.

To be clear, I continue to endorse my belief that Said is right about most of the issues he identifies, about the epistemic standards of this site being low, and about the ever-present risk that absent consistent and pointed (reasonable) criticism, comment sections and the site culture will inevitably devolve into happy death spirals over applause lights.

And ... (read more)

Showing 3 of 24 replies (Click to show all)

Out of curiosity, what evidence would change your mind?

This one seems pretty easy. If multiple notable past contributors speak out themselves and say that they stopped contributing to LW because of individual persistently annoying commenters, naming Said as one of them, that would be pretty clear evidence. Also socially awkward of course. But the general mindset of old-school internet forum discourse is that stuff people say publicly under their own accounts exists and claimed backchannel communications are shit someone made up to win an argument.

2Garrett Baker
Why alarming? I don't think LessWrong is the hub for any one sort of feedback, but on balance it seems like a good source of feedback. Certainly Said & his approach isn't the best possible response in every circumstance, I'm sure even he would agree with that, even if he thinks there should be more of it.
3Three-Monkey Mind
Because it used to be the obvious place to post something rationality-related where one could get good critical feedback, up to and including “you’re totally wrong, here’s why” or “have you considered…?” (where considering the thing totally invalidates or falsifies the idea I was trying to put forward).

As part of the alignment faking paper, I hosted a website with ~250k transcripts from our experiments (including transcripts with alignment-faking reasoning). I didn't include a canary string (which was a mistake).[1]

The current state is that the website has a canary string, a robots.txt, and a terms of service which prohibits training. The GitHub repo which hosts the website is now private. I'm tentatively planning on putting the content behind Cloudflare Turnstile, but this hasn't happened yet.

The data is also hosted in zips in a publicly accessible Goog... (read more)

Showing 3 of 5 replies (Click to show all)

You could handle both old and new scrapes by moving the content to a different URL, changing the original URL to a link to the new URL, and protecting only the new URL from scraping.

4peterbarnett
Have you contacted the big AI companies (OpenAI, Anthropic, GDM, Meta?) and asked them if they can remove this from their scrapes?
2ryan_greenblatt
Something tricky about this is that researchers might want to display their data/transcripts in a particular way. So, the guide should ideally support this sort of thing. Not sure how this would interact with the 1 hour criteria.

Technical AI alignment/control is still impactful; don't go all-in on AI gov!

  • Liability incentivizes safeguards, even absent regulation;
  • Cheaper, more effective safeguards make it easier for labs to meet safety standards;
  • Concrete, achievable safeguards give regulation teeth.

There are definitely still benefits to doing alignment research, but this only justifies the idea that doing alignment research is better than doing nothing.

IMO the thing that matters (for an individual making decisions about what to do with their career) is something more like "on the margin, would it be better to have one additional person do AI governance or alignment/control?"

I happen to think that given the current allocation of talent, on-the-margin it's generally better for people to choose AI policy. (Particularly efforts to contribute technical ex... (read more)

1Felix C.
What are your thoughts on the relative value of AI governance/advocacy vs. technical research? It seems to me that many of the technical problems are essentially downstream of politics; that intent alignment could be solved, if only our race dynamics were mitigated, regulation was used to slow capabilities research, and if it was given funding/strategic priority. 
8Thomas Larsen
Also there's a good chance AI gov won't work, and labs will just have a very limited safety budget to implement their best guess mitigations. Or maybe AI gov does work and we get a large budget, we still need to actually solve alignment. 

On @Gordon Seidoh Worley’s his recent post, “Religion for Rationalists”, the following exchange took place:

Kabir Kumar:

Rationality/EA basically is a religion already, no?

Gordon Seidoh Worley:

No, or so say I.

I prefer not to adjudicate this on some formal basis. There are several attempts by academics to define religion, but I think it’s better to ask “does rationality or EA look sufficiently like other things that are definitely religions that we should call them religions”.

I say “no” on the basis of a few factors: …

Said Achmiz:

EA is [a religion]

... (read more)
Showing 3 of 4 replies (Click to show all)
3Said Achmiz
I totally agree that being able to change your mind is good, and that the important thing is that you end up in the right place. (Although I think that your caveat “as long as you keep working to craft better models of those topics and not just flail about randomly” is doing a lot of work in the thesis you express; and it seems to me that this caveat has some requirements that make most of the specifics of what you endorse here… improbable. That is: if you don’t track your own path, then it’s much more likely that you’re flailing, and much more likely that you will flail; so what you gain in “lightness”, you lose in directionality. Indeed, I think you likely lose more than you gain.) However. If you change your mind about something, then it behooves you to not then behave as if your previous beliefs are bizarre, surprising, and explainable only as deliberate insults or other forms of bad faith.
2Vladimir_Nesov
Subjectively, it seems the things that are important to track are facts and observations, possibly specific sources (papers, videos), but not your own path or provisional positions expressed at various points along the way. So attention to detail, but not the detail of your own positions or externally expressed statements about them, that's rarely of any value. You track the things encountered along your own path, so long as they remain relevant and not otherwise, but never the path itself. That's the broadly accepted norm. My point is that I think it's a bad norm that damages effectiveness of lightness and does nothing useful.

Alright. Well, I guess we disagree here. I think the broadly accepted norm is good, and what you propose is bad (for the reason I describe in the grandparent comment).

Saw a post about "left leaning/liberal" rationalist discord channel earlier this month, but the discord channel invitation link got expired when I tried today, and I could not find the original post anymore. Could anyone in that group post the invitation link again if possible? Much appreciated. (Apologize in advance if this is something that is not allowed.)

I've launched Forecast Labs, an organization focused on using AI forecasting to help reduce AI risk.

Our initial results are promising. We have an AI model that is outperforming superforecasters on the Manifold Markets benchmark, as evaluated by ForecastBench. You can see a summary of the results at our website: https://www.forecastlabs.org/results.

This is just the preliminary scaffolding, and there's significant room for improvement. The long-term vision is to develop these AI forecasting capabilities to a point where we can construct large-scale causal mo... (read more)

If every private message on lesswrong or every cold email you ever wrote has received a response, you are either spending too much time writing them, are very young or aren't sending enough of them.

Showing 3 of 4 replies (Click to show all)
1Roman Malov
I would also add: if in the group chat your messages spark lots of conversation every time, this chat is underexploited (you are not sending enough/thinking too much). (Conversely, if your messages are ignored, spend a little bit more time on them)
1Roman Malov
Why would that affect the frequency of the responses? How would receivers even deduce the age of the sender?

I was just trying to adjust a loophole that often seemed to be missing in Umeshisms, but I think this made my statement more confusing: if you are 15 years old (the particular age is irrelevant, I am just saying an age exists), then you having sent 1-2 cold emails is not too little, nor did you invest too much time, you are just young and there weren't that many worthy occasions yet. If you have just taken a single flight in your life and missed 0, this is not large evidence that you spend too much time at airports.

Attention can perhaps be compared to a searchlight, And wherever that searchlight lands in the brain, You’re able to “think more” in that area. How does the brain do that? Where is it “taking” this processing power from?

The areas and senses around it perhaps. Could that be why when you’re super focused, everything else around you other than the thing you are focused on seems to “fade”? It’s not just by comparison to the brightness of your attention, but also because the processing is being “squeezed out” of the other areas of your mind.

The principal here is competition among populations of neurons. The purpose is to reduce crosstalk. Higher brain regions can focus on processing only the stuff you're attending to because most of their inputs have been down-regulated so only the attended ones are sending information.

The principal operates by simple competition. If I'm thinking about colors, higher areas are representing colors. That activates lower areas/neurons representing colors. because they're wired together by associative learning (or just about any useful learning rule will connect ... (read more)

Someone thought it would be useful to quickly write up a note on my thoughts on scalable oversight research, e.g., research into techniques like debate or generally improving the quality of human oversight using AI assistance or other methods. Broadly, my view is that this is a good research direction and I'm reasonably optimistic that work along these lines can improve our ability to effectively oversee somewhat smarter AIs which seems helpful (on my views about how the future will go).

I'm most excited for:

... (read more)

Taking time away from something and then returning to it later often reveals flaws otherwise unseen. I've been thinking about how to gain the same benefit without needing to take time away.

Changing perspective is the obvious approach.

In art and design, flipping a canvas often forces a reevaluation and reveals much that the eye has grown blind to. Inverting colours, switching to greyscale, obscuring, etc, can have a similar effect.

When writing, speaking written words aloud often helps in identifying flaws.

Similarly, explaining why you've done something – à la rubber duck debugging – can weed out things that don't make sense.

Usefulness of Bayes Rule to application of mental models
Hi, is the following Bayesian formulation generally well-known, when it comes to applying ideas/mental models to a given Context? "The probability that 'an Idea is applicable' to a Context, is equal to: the probability of how often this Context shows up within that Idea's applications, multiplied by the general applicability of the Idea and divided by the general probability of that Context."

P(Idea's applicability∣Context) = P(Context showing up∣Idea is applied)∗P(Idea applied) / P (Context)​

Apologies... (read more)

Showing 3 of 6 replies (Click to show all)
1Pramod Biligiri
Sure, let me try with two different examples. Let's say you come across a mathematical problem. And you wonder whether it can be solved using Calculus? My statement above implies that this would be a function of: how often problems that "look like this" (yeah this is slightly handwavy) show up within Calculus, how often problems like this show up at all(!), and how often Calculus is useful/applicable in general (!). The quantitative eqn would be: Probability (Calculus is applicable | Problem) = Probability (Problems look like this | Calculus has been successfully applied) * Probability (Calculus's applicability in general) / Probability (Problems look like this in general). The "look like this" parts above are unwieldy but I don't know how else to characterize problems that look similar. There is also messing around with tenses as @JBlack has pointed out. Perhaps that's a fatal error. I haven't thought through that yet. Applying this to say Cognitive Biases: Let's say you are in a decision-making situation, and you're wondering if Loss Aversion might be at play here (although you don't know for sure yet). Applying the same principle, the quantitative eqn would be: Probability (Loss Aversion is applicable | Situation) = Probability (Situation looks like this | Loss Aversion has been successfully applied) * Probability (Loss Aversion's applicability in general) / Probability (Situations like this arise).
5cubefox
To make this valid we need two things. * Phrase this only in terms of events/propositions, not in terms of single words like "context", "situation" or "problem". Probability only applies to events which occur or don't occur, or to propositions that are true or false. Otherwise it is unclear what e.g. P(x | Situation) means. * Bayes theorem involves exactly two events (or propositions), so we must make sure we express merely similar sounding events (Situation looks like this / Situation like this arise) with one and the same event, in order to not exceed the total amount of two events overall. Before you read on, you may want to first try the above approach yourself. The following is my attempt at a solution. Formalization attempt Here is a possible formalization which uses only two propositions (albeit with two "free variables" and the indexical "current", which arguably is another variable): 1. Idea A applies to the current context. 2. The current context is of type B. P(Idea A applies to the current context | The current context is of type B) = P(The current context is of type B | Idea A applies to the current context) * P(The idea A applies to the current context) / P(The current context is of type B) Or perhaps more abstractly: ∀A∀B∀x(P(A(x)∣B(x))=P(B(x)∣A(x))×P(A(x))P(B(x))) Not sure whether this accurately captures what you had in mind. It probably needs to be refined.

Thanks for the formalization attempt. After thinking and reading some more, I feel I've only restated in a vague manner the Hypothesis and Evidence version of Bayes' Theorem - https://en.wikipedia.org/wiki/Bayesian_inference. Quoting from that page: "{\displaystyle P(H\mid E)}, the posterior probability, is the probability of H given E, i.e., after E is observed. This is what we want to know: the probability of a hypothesis given the observed evidence."

"Idea A applies" would be the Hypothesis in my case, and "current context is of type B" is the Evidence. To restate:
P(Idea A applie... (read more)

Optimality is about winning. Rationality is about optimality.  

Here are some propositions I think I believe about consciousness:

  1. Consciousness in humans is an evolved feature; that is, it supports survival and reproduction; at some point in our evolutionary history, animals with more of it out-competed animals with less.
  2. Some conscious entities sometimes talk truthfully about their consciousness. It is often possible for humans to report true facts about their own objects of consciousness (e.g. self-awareness, qualia, emotions, thoughts, wants, etc.; "OC" for short).
  3. Consciousness is causally upstream of humans emitting
... (read more)

I disagree with (4) in that many sentences concerning nonexistent referents will be vacuously true rather than false. For those that are false, their manner of being false will be different from any of your example sentences.

I also think that for all behavioural purposes, statements involving OC can be transformed into statements not involving OC with the same externally verifiable content. That means that I also disagree with (8) and therefore (9): Zombies can honestly promise things about their 'intentions' as cashed out in future behaviour, and can coor... (read more)

Has anyone tried duncan sabien's colour wheel thing?

https://homosabiens.substack.com/p/the-mtg-color-wheel

My colours: Red, followed by Blue, followed by Black

I don't know, I also see them as the traits needed in different stages of a movement.

  • Red = freedom from previous movement.
  • Blue = figuring out how to organise a new movement.
  • Black = executing the movement to its intended ends.

I had a white-blue upbringing (military family) and a blue-green career (see below); my hobbies are black-green-white (compost makes my garden grow to feed our community); my vices are green-red; and my politics are five-color (at least to me).

Almost all of my professional career has been in sysadmin and SRE roles: which is tech (blue) but cares about keeping things reliable and sustainable (green) rather than pursuing novelty (red). Within tech's blue, it seems to me that developer roles run blue-red (build the exciting new feature!); management roles run... (read more)

A paperclip maximizer would finally turn itself into paperclips after having paperclipped the entire universe.

And probably each local instance would paperclip itself when the locally-reachable resources were clipped.  "local" being defined as the area of spacetime which does not have a different instance in progress to clippify it.

decisionproblem.com/paperclips/index2.html demonstrates some features of this (though it has a different take on distribution), and is amazingly playable as a game.

Hi everyone! My name is Ana, I am a sociology student and I am doing a research project at the University of Buenos Aires. In this post, I'm going to tell you a little about the approach I'm working on to understand how discourses surrounding AI Safety are structured and circulated, and I'm going to ask you some questions about your experiences. 

For some time now I have been reading many of the things that are discussed in Less Wrong and in other spaces where AI Safety is published. Although from what I understand and from what I saw in the Less Wrong... (read more)

Nobody at Anthropic can point to a credible technical plan for actually controlling a generally superhuman model. If it’s smarter than you, knows about its situation, and can reason about the people training it, this is a zero-shot regime.

The world, including Anthropic, is acting as if "surely, we’ll figure something out before anything catastrophic happens."

That is unearned optimism. No other engineering field would accept "I hope we magically pass the hardest test on the first try, with the highest stakes" as an answer. Just imagine if flight or nuclear ... (read more)

Showing 3 of 81 replies (Click to show all)

I mean, a very classical example that I've seen a few times in media is shooting a civilian who is about to walk into a minefield in which multiple other civilians or military members are located. It seems tragic but obviously the right choice to shoot them if they don't heed your warning. 

IDK, I also think it's the right choice to pull the lever in the trolley problem, though the choice becomes less obvious the more it involves active killing as opposed to literally pulling a lever.

4Ben Pace
To me the issue goes the other way. The idea of “losing deontological protection” suggests I’m allowed to ignore deontological rules when interacting with someone. But that seems obviously crazy to me. For instance I think there’s a deontological injunction against lying, but just because someone lies doesn’t now mean I’m allowed to kill them. It doesn’t even mean I’m allowed to lie to them. I think lying to them would still be about as wrong as it was before, not a free action I can take whenever I feel like it.
4Ben Pace
There is something important to me in this conversation about not trusting one’s consequentialist analysis when evaluating proposals to violate deontological lines, and from my perspective you still haven’t managed to paraphrase this basic ethical idea or shown you’ve understood it, which I feel a little frustrated over. Ah well. I still have been glad of this opportunity to argue it through, and I feel grateful to Neel for that.

Just 13 days after the world was surprised by Operation Spiderweb, where the Ukrainian military and intelligence forces infiltrated Russia with drones and destroyed a major portion of Russia's long-range air offensive capabilities, last night Israel began a major operation against Iran using similar, novel tactics.

Similar to Operation Spiderweb, Israel infiltrated Iran and placed drones near air defense systems. These drones were activated all at once and disabled the majority of these air defense systems, allowing Israel to embark on a major air offensive... (read more)

Showing 3 of 36 replies (Click to show all)
2MondSemmel
By which mechanism would all that defense spending be quickly repurposed towards drone manufacturing? All the things that make big institutions so small-c conservative - like the bureaucracy, the legal apparatus, the procurement rules, and the defense contractors with their long-running contracts - ensure that no such large-scale shift in strategy can occur, no? And even if that did happen, by which mechanism do you convert $1T into actually manufactured drones within any relevant time frame?

I think if you have literal hot war between two superpowers, a lot of stuff can happen. The classical example is of course the US repurposing a large fraction of its economy towards the war effort in World War II. Is that still feasible today? I do not know, but I doubt the defense contractor industry would be the biggest obstacle in the way.

3Alexander Gietelink Oldenziel
Much appreciated Habryka-san! You might be interested in my old shortform on the military balance of power between US and China too. It's a bit dated by now - the importance of drones has become much more clear by now [I think the evidence that we are in a military technological revolution on par with the introduction of guns] but you may find it of interest regardless. 
Load More