The recent Gordon Seidoh Worley/Said Achmiz blowup and the subsequent threads (1, 2) it spawned, along my own involvement in them, got me thinking a bit about this site, on a more nostalgic/meta level.
To be clear, I continue to endorse my belief that Said is right about most of the issues he identifies, about the epistemic standards of this site being low, and about the ever-present risk that absent consistent and pointed (reasonable) criticism, comment sections and the site culture will inevitably devolve into happy death spirals over applause lights.
And ...
Out of curiosity, what evidence would change your mind?
This one seems pretty easy. If multiple notable past contributors speak out themselves and say that they stopped contributing to LW because of individual persistently annoying commenters, naming Said as one of them, that would be pretty clear evidence. Also socially awkward of course. But the general mindset of old-school internet forum discourse is that stuff people say publicly under their own accounts exists and claimed backchannel communications are shit someone made up to win an argument.
As part of the alignment faking paper, I hosted a website with ~250k transcripts from our experiments (including transcripts with alignment-faking reasoning). I didn't include a canary string (which was a mistake).[1]
The current state is that the website has a canary string, a robots.txt, and a terms of service which prohibits training. The GitHub repo which hosts the website is now private. I'm tentatively planning on putting the content behind Cloudflare Turnstile, but this hasn't happened yet.
The data is also hosted in zips in a publicly accessible Goog...
There are definitely still benefits to doing alignment research, but this only justifies the idea that doing alignment research is better than doing nothing.
IMO the thing that matters (for an individual making decisions about what to do with their career) is something more like "on the margin, would it be better to have one additional person do AI governance or alignment/control?"
I happen to think that given the current allocation of talent, on-the-margin it's generally better for people to choose AI policy. (Particularly efforts to contribute technical ex...
On @Gordon Seidoh Worley’s his recent post, “Religion for Rationalists”, the following exchange took place:
Kabir Kumar:
Rationality/EA basically is a religion already, no?
Gordon Seidoh Worley:
No, or so say I.
I prefer not to adjudicate this on some formal basis. There are several attempts by academics to define religion, but I think it’s better to ask “does rationality or EA look sufficiently like other things that are definitely religions that we should call them religions”.
I say “no” on the basis of a few factors: …
Said Achmiz:
...EA is [a religion]
Saw a post about "left leaning/liberal" rationalist discord channel earlier this month, but the discord channel invitation link got expired when I tried today, and I could not find the original post anymore. Could anyone in that group post the invitation link again if possible? Much appreciated. (Apologize in advance if this is something that is not allowed.)
I've launched Forecast Labs, an organization focused on using AI forecasting to help reduce AI risk.
Our initial results are promising. We have an AI model that is outperforming superforecasters on the Manifold Markets benchmark, as evaluated by ForecastBench. You can see a summary of the results at our website: https://www.forecastlabs.org/results.
This is just the preliminary scaffolding, and there's significant room for improvement. The long-term vision is to develop these AI forecasting capabilities to a point where we can construct large-scale causal mo...
I was just trying to adjust a loophole that often seemed to be missing in Umeshisms, but I think this made my statement more confusing: if you are 15 years old (the particular age is irrelevant, I am just saying an age exists), then you having sent 1-2 cold emails is not too little, nor did you invest too much time, you are just young and there weren't that many worthy occasions yet. If you have just taken a single flight in your life and missed 0, this is not large evidence that you spend too much time at airports.
Attention can perhaps be compared to a searchlight, And wherever that searchlight lands in the brain, You’re able to “think more” in that area. How does the brain do that? Where is it “taking” this processing power from?
The areas and senses around it perhaps. Could that be why when you’re super focused, everything else around you other than the thing you are focused on seems to “fade”? It’s not just by comparison to the brightness of your attention, but also because the processing is being “squeezed out” of the other areas of your mind.
The principal here is competition among populations of neurons. The purpose is to reduce crosstalk. Higher brain regions can focus on processing only the stuff you're attending to because most of their inputs have been down-regulated so only the attended ones are sending information.
The principal operates by simple competition. If I'm thinking about colors, higher areas are representing colors. That activates lower areas/neurons representing colors. because they're wired together by associative learning (or just about any useful learning rule will connect ...
Someone thought it would be useful to quickly write up a note on my thoughts on scalable oversight research, e.g., research into techniques like debate or generally improving the quality of human oversight using AI assistance or other methods. Broadly, my view is that this is a good research direction and I'm reasonably optimistic that work along these lines can improve our ability to effectively oversee somewhat smarter AIs which seems helpful (on my views about how the future will go).
I'm most excited for:
Taking time away from something and then returning to it later often reveals flaws otherwise unseen. I've been thinking about how to gain the same benefit without needing to take time away.
Changing perspective is the obvious approach.
In art and design, flipping a canvas often forces a reevaluation and reveals much that the eye has grown blind to. Inverting colours, switching to greyscale, obscuring, etc, can have a similar effect.
When writing, speaking written words aloud often helps in identifying flaws.
Similarly, explaining why you've done something – à la rubber duck debugging – can weed out things that don't make sense.
Usefulness of Bayes Rule to application of mental models
Hi, is the following Bayesian formulation generally well-known, when it comes to applying ideas/mental models to a given Context? "The probability that 'an Idea is applicable' to a Context, is equal to: the probability of how often this Context shows up within that Idea's applications, multiplied by the general applicability of the Idea and divided by the general probability of that Context."
P(Idea's applicability∣Context) = P(Context showing up∣Idea is applied)∗P(Idea applied) / P (Context)
Apologies...
Thanks for the formalization attempt. After thinking and reading some more, I feel I've only restated in a vague manner the Hypothesis and Evidence version of Bayes' Theorem - https://en.wikipedia.org/wiki/Bayesian_inference. Quoting from that page: ", the posterior probability, is the probability of H given E, i.e., after E is observed. This is what we want to know: the probability of a hypothesis given the observed evidence."
"Idea A applies" would be the Hypothesis in my case, and "current context is of type B" is the Evidence. To restate:
P(Idea A applie...
Here are some propositions I think I believe about consciousness:
I disagree with (4) in that many sentences concerning nonexistent referents will be vacuously true rather than false. For those that are false, their manner of being false will be different from any of your example sentences.
I also think that for all behavioural purposes, statements involving OC can be transformed into statements not involving OC with the same externally verifiable content. That means that I also disagree with (8) and therefore (9): Zombies can honestly promise things about their 'intentions' as cashed out in future behaviour, and can coor...
Has anyone tried duncan sabien's colour wheel thing?
https://homosabiens.substack.com/p/the-mtg-color-wheel
My colours: Red, followed by Blue, followed by Black
I don't know, I also see them as the traits needed in different stages of a movement.
I had a white-blue upbringing (military family) and a blue-green career (see below); my hobbies are black-green-white (compost makes my garden grow to feed our community); my vices are green-red; and my politics are five-color (at least to me).
Almost all of my professional career has been in sysadmin and SRE roles: which is tech (blue) but cares about keeping things reliable and sustainable (green) rather than pursuing novelty (red). Within tech's blue, it seems to me that developer roles run blue-red (build the exciting new feature!); management roles run...
And probably each local instance would paperclip itself when the locally-reachable resources were clipped. "local" being defined as the area of spacetime which does not have a different instance in progress to clippify it.
decisionproblem.com/paperclips/index2.html demonstrates some features of this (though it has a different take on distribution), and is amazingly playable as a game.
Hi everyone! My name is Ana, I am a sociology student and I am doing a research project at the University of Buenos Aires. In this post, I'm going to tell you a little about the approach I'm working on to understand how discourses surrounding AI Safety are structured and circulated, and I'm going to ask you some questions about your experiences.
For some time now I have been reading many of the things that are discussed in Less Wrong and in other spaces where AI Safety is published. Although from what I understand and from what I saw in the Less Wrong...
Nobody at Anthropic can point to a credible technical plan for actually controlling a generally superhuman model. If it’s smarter than you, knows about its situation, and can reason about the people training it, this is a zero-shot regime.
The world, including Anthropic, is acting as if "surely, we’ll figure something out before anything catastrophic happens."
That is unearned optimism. No other engineering field would accept "I hope we magically pass the hardest test on the first try, with the highest stakes" as an answer. Just imagine if flight or nuclear ...
I mean, a very classical example that I've seen a few times in media is shooting a civilian who is about to walk into a minefield in which multiple other civilians or military members are located. It seems tragic but obviously the right choice to shoot them if they don't heed your warning.
IDK, I also think it's the right choice to pull the lever in the trolley problem, though the choice becomes less obvious the more it involves active killing as opposed to literally pulling a lever.
Just 13 days after the world was surprised by Operation Spiderweb, where the Ukrainian military and intelligence forces infiltrated Russia with drones and destroyed a major portion of Russia's long-range air offensive capabilities, last night Israel began a major operation against Iran using similar, novel tactics.
Similar to Operation Spiderweb, Israel infiltrated Iran and placed drones near air defense systems. These drones were activated all at once and disabled the majority of these air defense systems, allowing Israel to embark on a major air offensive...
I think if you have literal hot war between two superpowers, a lot of stuff can happen. The classical example is of course the US repurposing a large fraction of its economy towards the war effort in World War II. Is that still feasible today? I do not know, but I doubt the defense contractor industry would be the biggest obstacle in the way.