Harrison G

Interested in AI alignment, thinking about ethics, tap dancing, playing instruments, and wearing sandals year-round.

Posts

Sorted by New

8Thinking About Propensity Evaluations

4mo

12A Taxonomy Of AI System Evaluations

4mo

11Distilled - AGI Safety from First Principles

Wiki Contributions

Comments

Sorted by

Newest

[Linkpost] Introducing Superalignment

Harrison G1y30

The quote: "Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing)."

More ways to spot abysses

Harrison G2y30

Super helpful; thanks for writing!

On sincerity

Harrison G2y30

(read: The Athena-Parfit Long-Term Institute for Raising for Effectively Prioritizing Global Alignment Challenges)

I laughed about this for a while. Thank you for this though-provoking post, and for incorporating occasional humor throughout.

Things I carry almost every day, as of late December 2022

Harrison G2y10

At the top right is a pocket constitution made by Legal Impact for Chickens. I received this at an Effective Altruism Global conference, during the career fair. What actually happened was that someone came up to the booth I was at holding the pocket constitution, I noted that it looked cool, and they were kind enough to offer it to me. Unfortunately, I have never knowingly met anybody from Legal Impact for Chickens. I have not actually used this pocket constitution, but I carry it anyway in my winter jacket’s inner breast pocket since (a) it fits very unobtrusively and (b) it seems cool to carry around a pocket constitution.

If this was EAG SF, I remember an experience that sounds very similar to this, and I think I was this person! Ha

A Proof Against Oracle AI

Harrison G2y00

" [...] since every string can be reconstructed by only answering yes or no to questions like 'is the first bit 1?' [...]"

Why would humans ever ask this question, and (furthermore) why would we ever ask this question n number of times? It seems unlikely, and easy to prevent. Is there something I'm not understanding about this step?