Ramana Kumar — LessWrong

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

Let me know when you can receive donations via a UK charity.

Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?

Answer by Ramana KumarSep 26, 2024Ω120

Vaguely related perhaps is the work on Decoupled Approval: https://arxiv.org/abs/2011.08827

Thanks for this! I think the categories of morality is a useful framework. I am very wary of the judgement that care-morality is appropriate for less capable subjects - basically because of paternalism.

Consent across power differentials

Ramana Kumar1yΩ350

Just to confirm that this is a great example and wasn't deliberately left out.

Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ramana Kumar2yΩ8104Review for 2022 Review

I found this post to be a clear and reasonable-sounding articulation of one of the main arguments for there being catastrophic risk from AI development. It helped me with my own thinking to an extent. I think it has a lot of shareability value.

Systems that cannot be unsafe cannot be safe

Ramana Kumar2yΩ360

I agree with this post. However, I think it's common amongst ML enthusiasts to eschew specification and defer to statistics on everything. (Or datapoints trying to capture an "I know it when I see it" "specification".)

Why do we care about agency for alignment?

Answer by Ramana KumarApr 23, 2023Ω460

This is one of the answers: https://www.alignmentforum.org/posts/FWvzwCDRgcjb9sigb/why-agent-foundations-an-overly-abstract-explanation

Teleosemantics!

Ramana Kumar3yΩ120

The trick is that for some of the optimisations, a mind is not necessary. There is a sense perhaps in which the whole history of the universe (or life on earth, or evolution, or whatever is appropriate) will become implicated for some questions, though.

AI and Evolution

Ramana Kumar3yΩ352

I think https://www.alignmentforum.org/posts/TATWqHvxKEpL34yKz/intelligence-or-evolution is somewhat related in case you haven't seen it.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments