leogao

Sequences

Alignment Stream of Thought

Wiki Contributions

Comments

Sorted by
leogao73

Basically agree - I'm generally a strong supporter of looking at the loss drop in terms of effective compute. Loss recovered using a zero-ablation baseline is really quite wonky and gives misleadingly big numbers.

I also agree that reconstruction is not the only axis of SAE quality we care about. I propose explainability as the other axis - whether we can make necessary and sufficient explanations for when individual latents activate. Progress then looks like pushing this Pareto frontier.

leogao142

Extremely valid, you've convinced me that atom is probably a bad term for this

leogao6-4

I like the word "atom" to refer to units inside an SAE

leogao143

Keep in mind that if, hypothetically, there were major compute efficiency tricks to be had, they would likely not be shared publicly. So the absence of publicly known techniques is not strong evidence in either direction.

Also, in general I start from a prior of being skeptical of papers claiming their models are comparable/better than GPT-4. It's very easy to mislead with statistics - for example, human preference comparisons depend very heavily on the task distribution, and how discerning the raters are. I have not specifically looked deeply into Llama 405B though.

leogaoΩ6102

This is likely not the first instance, but OpenAI was already using the word "aligned" in this way in 2021 in the Codex paper.

https://arxiv.org/abs/2107.03374 (section 7.2)

leogao73

investment in anything speculative, including alignment, and AGI research, is likely to decrease if the economy is not doing great

leogao61

for a sense of scale of just how bubbly things can get: Bitcoin has a market cap of ~1T, and the entirety of crypto ~2T. Crypto does produce some amount of real value, but probably on the order of magnitude of 1% that market cap. So it's not at all unheard of for speculation to account for literally trillions of dollars of map (or ~tens of billions of earnings per year, at a reasonable P/E ratio)

leogao94

economic recession and subsequent reduction in speculative research, including towards AGI, seems very plausible

AI (by which I mean, like, big neural networks and whatever) is not that economically useful right now. furthermore, current usage figures are likely an overestimate of true economic usefulness because a very large fraction of it is likely to be bubbly spending that will itself dry up if there is a recession (legacy companies putting LLMs into things to be cool, startups that are burning money without PMF, consumers with disposable income to spend on entertainment).

it will probably still be profitable to develop AI tech, but things will be much more tethered to consumer usefulness.

this probably doesn't set AGI back that much but I think people are heavily underrating this as a possibility. it also probably heavily impacts the amount of alignment work done at labs.

leogao170

even if scaling does eventually solve the reliability problem, it means that very plausibly people are overestimating how far along capabilities are, and how fast the rate of progress is, because the most impressive thing that can be done with 90% reliability plausibly advances faster than the most impressive thing that can be done with 99.9% reliability

leogao44

I think even if failures are automatically detectable, it's quite annoying. the cost is very logarithmic: there's a very large cliff in effort when going from zero manual intervention required to any manual intervention required whatsoever; and as the amount of manual intervention continues to increase, you can invest in infrastructure to make it less painful, and then to delegate the work out to other people.

Load More