khafra — LessWrong

There's an article type called "You Could Have Invented" that I became aware of on reading Gwern's You Could Have Invented Transformers.
This type dates back to at least 2012. I believe they're usually good zetetic explanations.

Underdog bias rules everything around me

khafra1mo1-1

In a stereotypical old-west gunfight, one fighter is more experienced and has a strong reputation; the other fighter is the underdog and considered likely to lose. But who's the underdog of a grenade fight inside a bank vault? Both sides are overwhelmingly likely to lose.

At least one side of many political battles believe they're in a grenade fight, where there's little or nothing they can do to prevent the other side from destroying a lot of value. and could reasonably feel like an underdog even if they have a full bandolier of grenades and the other side has only one or two.

A Simple Explanation of AGI Risk

khafra3mo20

I don't think "perfect" is a good descriptor for the missing solution. The solutions we have lack (at least) two crucial features:
1. A way to get an AI to prioritize the intended goals, with high enough fidelity to work when AI is no longer extremely corrigible, as today's AIs are (because they're not capable enough to circumvent human methods of control).
2. A way that works far enough outside of the training set. E.g., when AI is substantially in charge of logistics, research and development, security, etc.; and is doing those things in novel ways.

Colonialism in space: Does a collection of minds have exactly two attractors?

Answer by khafraMay 28, 202531

Robin Hanson's model of quiet vs loud aliens seems fundamentally the same as this question, to me.

It's hard to make scheming evals look realistic for LLMs

khafra4mo62

Linear probes give better results than text output for quantitative predictions in economics. They'd likely give a better calibrated probability here, too.

Thomas Kwa's Shortform

khafra4mo30

I, too, would like to know how long it will be until my job is replaced by AI; and what fields, among those I could reasonably pivot to, will last the longest.

[linkpost] One Year in DC

khafra4mo21

I think it's especially true for the type of human that likes Lesswrong. Using Scott's distinction between Metis and Techne, we are drawn to Techne. When a techne-leaning person does a deep dive into metis, that can generate a lot of value.

More speculatively, I feel like often--as in the case of lobbying for good government policy--there isn't a straightforward way to capture any of the created value; so it is under-incentivized.

Pablo's Shortform

khafra5mo10

Well, that was an interesting top-down processing error.

Pablo's Shortform

khafra5mo00

Note that Alexander Kruel still blogs regularly on axisofordinary.blogspot.com, and from his Facebook account; he just doesn't say anything directly about rationalists. He mostly lists recent developments in AI, science, tech, and the Ukraine war.

Unbendable Arm as Test Case for Religious Belief

khafra6mo40

I've done some Aikido and related arts, and the unbending arm demo worked on me (IIRC, it was decades ago). But learning the biomechanics also worked. More advanced, related skills, like relaxing while maintaining a strongly upright stance, also worked best by starting out with some visualizations (like a string pulling up from the top of my head, and a weight pulling down from my sacrum).

But having a physics-based model of what I was trying to do, and why it worked, was essential for me to really solidify these skills--and incorrect explanations, which I sometimes got at first, did not help me. Could just be more headology, though--other students seemed to be able to do well based off the visualizations and practice.

https://www.lesswrong.com/posts/rZX4WuufAPbN6wQTv/no-really-i-ve-deceived-myself seems relevant.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments