Gunnar_Zarncke

Software engineering, parenting, cognition, meditation, other
Linkedin, Facebook, Admonymous (anonymous feedback)

Wiki Contributions

Load More

Comments

Sorted by

what is your current thinking on how shards lead to plans?

"It's from an old story titled Samsara."

link points to ACX?

So the safety measures they institute need to be:

Cheap [and] Low compliance overhead

There is an additional alternative: Safety measures that make the main product more useful, e.g., by catching failure cases like jailbreaks.

Granting powerful artificial agents personhood will be less of a social problem (they can fend for themselves) than personhood of minimal agents. On whatever criteria we agree, it can likely be used to engineer minimal agents. Agents that are just barely aware of being agents but have little computational capability beyond that. What if somebody puts these into a device? Can the device no longer be turned of? Copied? Modified?

Except where it is not or rather where it has saturated at a low level. Almost all growth is logistic growth and it is difficult to extrapolate logistic functions.

[^1]: Without loss of generality, the same applies if market B had a close-to-true probability estimate than A

You can insert real footnotes in the LW editor by marking text and using the footnote button in the popup menu.

There is a "more" missing at the end of the sentence.

I just saw a method to make more parts of the model human-legible, addressing the main concern.

Reasoning in non-legible latent spaces is a risk that can be addressed by making the latent spaces independently human interpretable. One such method is LatentQA: Teaching LLMs to Decode Activations Into Natural Language. Such methods have the advantage of making not only the output layer human-readable but potentially more parts of the model.

Load More