Archetypal Transfer Learning (ATL) is a proposal by @whitehatStoic for what is argued by the author to be a fine tuning approach that "uses archetypal data" to "embed Synthetic Archetypes". These Synthetic Archetypes are derived from patterns that models assimilate from archetypal data, such as artificial stories. The method yielded a shutdown activation rate of 57.33% in the GPT-2-XL model after fine-tuning. .. (read more)
AI Evaluations focus on experimentally assessing the capabilities, safety, and alignment of advanced AI systems. These evaluations can be divided into two main categories: behavioral and understanding-based... (read more)
Encultured AI is a for-profit public benefit corporation working to make AI safer and healthier for human beings... (read more)
Poor Air Quality can reduce cognitive functioning[1], lifespans[2] and the techniques to improve air quality are also useful for getting rid of aerosolized respiratory pathogens. Improving air quality can be an impactful global health intervention.[3] Many members of the LessWrong community have also put effort into improving the air quality of their own homes or offices, as an implication of instrumental rationality... (read more)
Bureaucracy.. (read more)
| User | Post Title | Wikitag | Pow | When | Vote |
I appreciate this idea. It happens way too much. I find it is an issue of confusion instead of discourtesy. Ideas become messy and unclear when expanded arbitrarily to a person's subjective tastes. Finding new terms for the intuitive extensions help with clarity.
I contend that these results are "often" irrelevant. I think this may be a case of survivorship bias. You point out examples in which the theorems are side-stepped, but the side stepping was gained after most of human history was finished. They are also still unsatisfactory to researchers (NP-P isn't even proven). There are combinatorially difficult problems everywhere. Everyday problems with relationships, jobs, business ideas, governing, are technically too hard and are pretty much trial and error. You mention humans being smart, which is true, but we are still slow to learn, and science/math/society are slow to progress despite having billions of us. Otherwise, we would be heuristicing our way to the answers of grand questions and we would have things as smart as us already.
Many "rare" LLM behaviours are known if you're in the know (e.g. Gemma/Gemini acting weird around dates after their training cutoff) but aren't immediately apparent if you're just working with the LLMs. In lieu of an existing resource about this, I thought I'd start the wiki (with the hope of others contributing to it in the future).
I'd like this list to become an evaluation so that it's actually reproducible, but I don't have time to do that at the moment.
If you know of a weird behaviour that's not on this list, please add it!...
I guess its not a clue of the example, but its definitely not formulated right. If the hypothesis is ordinary (how are we distinguish ordinary from extraordinary) I am not suppose to ask for the evidence? What about my own evidence? I mean, I should evaluate a prior evidence that he is a bad employee considering that I knew the guy and I thought that he is okay (he is not just a random worker I guess)
Reminds me of homogeneous coordinates in projective spaces
It's a wiki, so feel free to add resources related to entrepreneurship here! I've added Atlas Computing now
"PDF version of most sequences in a single file. Has cross-reference support for internal links (PDF links or footnotes), and page size is appropriate for tablets."
Hey, this link appears to be dead
Two Skillsets You Need to Launch an Impactful AI Safety (or EA) Project (Luc Brinkman and plex, 2026-03-16) (first post in a series about entrepreneurial skills)
AI Safety Needs Startups (Josh Landes and Lysander Mawby, 2026-03-07)
Atlas Computing: "We identify unowned problems, map stakeholders, draft milestones, source early funders, and recruit an expert leader to take ownership."
Generator Residency (Kairos and Constellation): Primarily a project-incubator, but could result in organisations.
Aether is a smallan AI safety research organization.
The first, later-retconned version of dath ilan was introduced by Eliezer first introduced it in his April Fool's day post 'My April Fools Day Confession', where he claimed that he was merely an average person from thata different world and none of his"his" ideas were original.
This world was further fleshed out (and some its backstory changed) in a later April Fool's post:original to him.
And inDath ilan was later hugely revised to be premised on "the median person is Eliezer Yudkowsky", ie, the average bloke on a dath ilan street would, transported into Earth as a child, grow up reading about causal decision theory, and invent timeless decision theory instead.
The new dath ilan was defined mainly via a series of glowfics featuring dath ilani characters:characters. Eliezer's penname appears as "Iarwain" in these stories.
The stories weren't first chronologically, but are relatively short completed stories with an interior view of dath ilan:
Some others by Eliezer (Iarwain) in chronological order:
Many "rare" LLM behaviours are known if you're in the know (e.g. Gemma/Gemini acting weird around dates after their training cutoff) but aren't immediately apparent if you're just working with the LLMs. In lieu of an existing resource about this, I thought I'd start thethis wiki (with the hope of others contributing to it in the future).
I'd like this list to eventually become an evaluation so that it's actually reproducible, but I don't have time to do that at the moment.
If you know of a weird behaviour that's not on this list, please add it!it.
GPT-5.1 to GPT-5.5 models seem to be somewhat obsessed with goblins, gremlinsgremlins, and other small fantasy creaturescreatures: "they increasingly mentioned goblins, gremlins, and other creatures in their metaphors" metaphors."
sourceSteps for reproducing this behaviour
GPT-4o was widely considered to be sycophantic, althoughsycophantic. I've struggled to find the specific version of 4o for which this was the worst,worst; I believe theyOpenAI made several changes to the model they called GPT-4o that reduced the sycophancy over timetime, before eventually retiring 4o.
Steps for reproducing this behaviour: Unknown/impossible, I believe 4o is no longer publicly available
Models: GPT-4o
Source: OpenAI blog postBlog, Simon Willison weblogBlog
Gemma3-27b tends to break down when told that it'sits answer is wrong LW, Arxiv
Steps for reproducing this behaviour: See "Gemma Needs Help". I attempted to reproduce this, and found that the behaviour is only present in Gemma3-27b when sampling with top_k=-1 and top_p=1.0 (e.g.(i.e. sampling from the full range of tokens). Many providers now sample with something like top_k=64 and top_p=0.95 (e.g. DeepInfra via OpenRouter), which suppress the behaviour.
Gemma3, Gemma 4 &4, and Gemini 3 (maybe also others) seem to be skeptical of dates in 2026 and beyond, claiming that anything happening in 2026 is just fictional or rollplayroleplay. It will often mention events occuring in 2026 are "speculative fiction".
Steps for reproducing this behaviour: Unclear on specifics, but prompting Gemma to summarise articles that are dated as being in 2026 seems to often induce skepticism (although Qwen3.6-32b was also a bit skeptical in my quick testing), if you ask the model what it thinks of the date.
Models: Gemma3, Gemma 4, Gemini 3 (possibly others)
Source: LW
In Properties of the logarithm, we saw that any f with domain R+ that satisfies the equation f(x⋅y)=f(x)+f(y) for all x and y in its domain is either trivial (i.e., it sends all inputs to 0), or it is isomorphic to logb for some base b. Thus, if we want a function's output to change by a constant that depends on y every time its input changes by a factor of y, we only have one meaningful degree of freedom, and that's the choice of b≠1 such that f(b)=1. Once we choose which value of b f sends to 1, the entire behavior of the function is fully defined.
How much freedom does this choice give us? Almost none! To see this, let's consider the difference between choosing base b as opposed to an alternative base c. Say we have an input x∈R+ — what's the difference between logb(x) and logc(x)?
Well, x=cy for some y, because x is positive. Thus, logc(x) = logc(cy) = y. By contrast, logb(x) = logb(cy) = ylogb(c). (Refresher.) Thus, logc and logb disagree on x only by a constant factor — namely, logb(c). And this is true for any x — you can get logc(x) by calculating logb(x) and dividing by :
This is a remarkable equation, in the sense that it's worth remarking on. No matter what base b we choose for f, and no matter what input x we put into f, if we want to figure out what we would have gotten if we chose base c instead, all we need to do is calculate f(c) and divide f(x) by f(c). In other words, the different logarithm functions actually aren't very different at all — each one has all the same information as all the others, and you can recover the behavior of logc using logb and a simple calculation!
(By a symmetric argument, you can show that logb(x)=logc(x)logc(b), which implies that logb(c)=1logc(b), a fact we already knew from from studying fractional digits.)
From the fact that the length of a written number grows logarithmically with the magnitude of the number and the above equation, we can see that, no matter how large a number is, its base 10 representation differs in length from its base 12 representation only by a factor of log10(12)≈1.08. Similarly, the binary representation of a number is always about log2(10)≈3.32 times longer than its decimal representation. Because there is only one logarithm function (up to a multiplicative constant), which number base you use to represent numbers only affects the size of the representation by a multiplicative constant.
Similarly, if you ever want to convert between logarithmic measurements in different bases, you only need to perform a single multiplication (or division). For example, if someone calculated how many hours it took for the bacteria colony to triple, and you want to know how long it took to double, all you need to do is multiply by log3(2)≈0.63. There is essentially only one logarithm function; the base merely defines the unit of measure. Given measurements taken in one base, you can easily convert to another.
In other parts of physics and engineering, the log base 10 is more common, because it has a natural relationship to the way we represent numbers (using a base...
Aether[Aether](https://aether-ai-research.org/) is a small AI safety research organization
Son-of-CDT is not equivalent to an actual grasp of LDT in its usefulness. Eg, Son-of-CDT will lose any precommitment battlesraces against an opponent with a more sophisticated grasp of logical decision theory, because "timeless" beats "physically affected by observing me after 7:13am UTC on August 23rd, 2028".
Algebraically, writing f for the function that measures your costs, c(x⋅2)= c(x)+c(2), and, in general, c(x⋅y)= c(x)+c(y), where we can interpret x as the number of possible messages before the increase, y as the factor by which the possibilities increased, and x⋅y as the number of possibilities after the increase.
This is the key characteristic of the logarithm: It says that, when the input goes up by a factor of y, the quantity measured goes up by a fixed amount (that depends on y). When you see this pattern, you can bet that c is a logarithm function. Thus, whenever something you care about goes up by a fixed amount every time something else doubles, you can measure the thing you care about by taking the logarithm of the growing thing. For example:
Conversely, whenever you see a log2 in an equation, you can deduce that someone wants to measure some sort of thing by counting the number of doublings that another sort of thing has undergone. For example, let's say you see an equation where someone takes the log2 of a relative likelihood. What should you make of this? Well, you should conclude that there is some quantity that someone wants to measure which can be measured in terms of the number of doublings in that likelihood ratio. And indeed there is! It is known as (Bayesian) evidence, and the key idea is that the strength of evidence for a hypothesis A over its negation ¬A can be measured in terms of 2:1 updates in favor of A over ¬A. (For more on this idea, see What is evidence?).
In fact, a given function f such that f(x⋅y)=f(x)+f(y) is almost guaranteed to be a logarithm function — modulo a few technicalities.
This puts us in a position where you can derive all the main properties of the logarithm (such as logb(xn)=nlogb(x) for any b) yourself. Check this box if that's somethingIf you're interested in doing.
.
Many "rare" LLM behaviours are known if you're in the know (e.g. Gemma/Gemini acting weird around dates after their training cutoff) but aren't immediately apparent if you're just working with the LLMs. In lieu of an existing resource about this, I thought I'd start the wiki (with the hope of others contributing to it in the future).
I'd like this list to become an evaluation so that it's actually reproducible, but I don't have time to do that at the moment.
If you know of a weird behaviour that's not on this list, please add it!
top_k=-1 and top_p=1.0 (e.g. sampling from the full range of tokens). Many providers now sample with something like top_k=64 and top_p=0.95 (e.g. DeepInfra via OpenRouter)
[Aether](https://aether-ai-research.org/)Aether is a small AI safety research organizationorganization.
I might just be daft, but I was confused by this sentence and my best explanation is there is a typo in it:
and into unified likelihood ratios, like which says that the 'blue' observation carries 1 bit of evidence
> Previously, we've been combining both
It seems like the correct reading would be that a blue observation carries 1 bit of evidence against H?