I aspire to become an alignment theorist — all other details are superfluous — I leave them here anyway for historical purposes.


Introduction

I have a set of questions I habitually ask online acquaintances that pique my interest/who I want to get to know better. Many want to know my answer to those same questions.

It would be nice to have a central repository introducing myself that I can keep up to date.


Questions

A. What do you care about? 

  • What are you passionate about? 
  • What animates you?

B. What do you think is important?

C. What do you want/hope to do with your life?

D. What do you want/hope out of life?

E. Where are you coming from?

  • Context on who you are/what made you the person you are today.

F. How do you spend your time?

  • Work, volunteer, education, etc.
  • Basically, activities that aren't primarily for leisure/pleasure purposes.

G. What do you do for recreation/leisure/pleasure/fun time?

  • What are your hobbies?

Answers

A.

 What do you care about? 

  • What are you passionate about? 
  • What animates you?

I care about creating a brighter future for humanity. I believe a world far better than any known to man is possible, and I am willing to fight for it.

I want humanity to be fucking awesome. To take dominion over the natural world and remake it in our own image, better configured to serve our values.

I want us to be as gods.

I outlined what godhood means for me here.

I think that vision is largely what drives me, what pushes me forward and keeps me going.

 

B.

B. What do you think is important?

 

Mitigating Existential Risk/Pursuing Existential Security

The obvious reasons are obvious.

But I am personally swayed by astronomical waste. I don't want us to squander our cosmic endowment. Especially because our future can be so wonderful, I think it would be very sad if we never realise it.

 

Promoting Existential Hope

I want to give people a positive vision of the future they can rally around and get excited by. Something that makes them glad to be alive. Eager to wake up each day. A goal to yearn for and aspire to.

To reach out to with relentless determination.

 

I'd like to communicate that:

  • The current state of the world is immensely better than even just three centuries ago
    • Life expectancy has doubled
    • Economic progress
      • The poverty rate has drastically fallen
      • Material abundance and comfort
      • Much faster and more reliable transport and communication
      • Etc.
    • Social progress
      • Slavery abolition
      • Women's suffrage
      • Spread of liberal democracy
      • Etc.
    • Etc.
  • Vastly better world states are yet possible
  • We can take actions that would make us significantly more likely to reach those vastly better states
  • We should do this

 

AI Safety

I believe that safely navigating the development of transformative artificial intelligence may be the most important project of the century.

Transformative AI could plausibly induce a paradigm shift in the human condition.

To explain what I mean by "paradigm shift in the human condition", I think we may see GDP doubling multiple times a year later this century.

(Depending on timelines and takeoff dynamics, doubling periods of a month or even shorter seem plausible.)

I'd like to approach AI safety from an agent foundations perspective (I think agent foundations work is neglected relative to its potential value and is a better fit for me). In particular, agent foundations solutions to alignment seem more likely to be:

  1. Robust to arbitrary capability amplification
    1. "Treacherous turns" seems less likely to be a challenge.
    2. An agent becoming more capable wouldn't make it any more able to violate theorems.
  2. Propagated across an agent's genealogy
    1. Aligned agents could only create children that they believed to be "agent foundations aligned".
    2. By induction this is propagated across all an agent's descendants.
  3. Robust to arbitrary self-modification (up to iteration)
    1. Self-modification can be viewed as a special case of creating a child agent.
    2. Aligned agents would only self-modify if they believed they would remain "agent foundations aligned" after self-modification.
  4. Capable of recursive amplification
    1. If we understand how to design very reliable agents that can in turn design very reliable agents of their own, we've created an agent that's capable of safe recursive amplification (a "seed" AI).

 

Agent foundations style approaches aren't the only approaches I'm considering, but it's what I plan to start out with.

 

C.

What do you want/hope to do with your life?

I want to maximise my expected positive impact on the world conditioning on who I am and the resources available to me:

  • Human capital
  • Social capital
  • Intellectual endowment

(The net present values thereof.)

And on the broader effective altruist community (I consider myself a member of the effective altruist community, so I should take actions from the perspective of maximising positive impact of the effective altruist community, not maximising my personal positive impact. Or rather, I believe that as a matter of normative decision theory, this is how best to maximise my values.)

 

I think my comparative advantage is something like:

  • I can write clearly and at length
  • I have good epistemics and can attain excellent epistemics
  • I can learn maths
  • I can devour a lot of information (via audio)
    • I've completed six non-fiction books this month and expect to finish ten by the time the month is over.
  • I'm young?
    • At 24, I can pivot my career to basically any trajectory I please.
  • I have some slack
    • I can afford to "fail", and not worry too much (single [no dependents otherwise], young, decent parental SES).
    • If I spend years chasing shadows, it's not a life ruining mistake.

 

Well considered, my plan to improve the world is something like "learn a lot of useful stuff then exploit that knowledge to make intellectual contributions to the most important problems confronting human civilisation".

 

Mathematics

I plan to learn a fuckton of abstract mathematics.

I want to learn the fundamental structure of reality. To become good at abstract thinking and formal reasoning. To abstract about abstraction itself. 

Studying abstract maths seems like it would be useful.

By "formal style reasoning", I'm referring to reasoning within formal systems: logics, mathematical models, computer programs/algorithms, explicit ontologies, other abstractions, etc.

I want to become good at building up a formal model of a domain (picking "good" models ["all models are wrong, but some are useful", generating "maps that better reflect the territory", "carving reality at its joints", etc.]), and navigating that model (making correct inferences, deriving insight about the underlying domain from the model, making predictions about the domain from the model, explaining the domain through the model, building up intuition about the domain from the model, deriving knowledge about the domain via the model, etc.).

I'd basically try to autodidact my way to the competence of a professional mathematician (e.g., an algebraic abstractologist).

I expect this would make me more proficient at agent foundations work, and some of the other important problems confronting human civilisation (now or later this century).

 

Computation

I have a bunch of questions that I want to resolve:

  1. On a fundamental level, what is "computation"?
    1. On a fundamental level, what is "information"?
  2. How are mathematics and "computation" related?
    1. How is computation and "knowledge"/"semantic content" related?
    2. Does a given computational state uniquely characterise semantic content?
  3. How are computation and physics related?
    1. How are information and physics related?
    2. How reducible is computation to physics?
      1. In general?
      2. Within our universe?
    3. Is a given computational "state" uniquely characterised by a given physical "state"?
  4. 3. How does the "limits of computability" depend on the "laws of physics" of a given universe?
    1. What are the limits of computability within our universe?

I stumbled on the above questions when trying to dissolve: "is consciousness reducible?"

 

General

I want to learn how the world works. I want to build a rich and coherent world model of human civilisation and of the physical reality we inhabit.

I would like to use that model to figure out how to uplift said civilisation. I'll advocate for said uplifting (ideally through my writing).

 

Artificial Intelligence

Stuff I'd like to understand on a fundamental level:

  • Cognition/general intelligence
  • Optimisation
  • "Learning"
    • Statistical learning
    • Algorithmic learning
    • Learning in humans
  • Epistemics
    • Probability theory?
    • Knowledge acquisition and representation
    • World modelling
  • Agency
    • What makes a system an "agent"?
    • Decision theory
    • Game theory
  • Values/motivations
    • Instrumental vs terminal
    • Orthogonality
    • Human values
  • Morality
    • As a phenomenon of human evolution (biological, sociocultural and memetic)
    • A comprehensive theoretical framework for (arbitrary?) agents
      • Emergent game theory?

Use this understanding to assist in the project of safely developing transformative artificial intelligence.

I want to take a stab at agent foundations style approaches to alignment while I'm still young (< 40, do mathematicians really stop making novel contributions after their 40th birthday? I don't know, but I'll try to milk my cognitive youth for all its worth).

 

Digital Minds

Stuff I'd like to learn at a fundamental level:

  • Neuroscience
  • Human cognition
  • Consciousness

 

I'd try to use this understanding (coupled with an understanding of computation) to solve technical/practical (especially safety/security/robustness/reliability/assurance) problems/challenges related to digital minds.

I'll probably write whitepapers specifying infrastructure to support human uploads. Depending on how AGI goes, I might get involved in human upload projects.

Eventually, advocate for transitioning to primarily digital substrate.


I expect to switch to working digital minds after AI safety. Ideally, when I think my marginal impact from AI safety work would be below my marginal impact if I switched to digital minds. This might be because:

  • We "solve" alignment or figure out a good enough theory
  • My research agenda/approaches reach dead ends, and I'm unsure how to proceed
  • I grow disillusioned in my ability to usefully contribute to the field
  • I come to believe I can make more meaningful contributions to digital minds
  • Etc.

 

Future

The career trajectories I described above are conditional on a normal human lifespan (e.g. I expect to have retired in 50 years). If I was able to attain indefinite life, there's a lot of stuff I'd like to do. But it's still the same basic template of "learn a lot of useful stuff then exploit that knowledge to make intellectual contributions to the most important problems confronting human civilisation".

I covered them in this Twitter thread.

 

D.

What do you want/hope out of life?

In no particular order:

  • Romantic fulfillment
    • I'd eventually like to get married I think
    • Probably have children
  • Friendship and companionship
    • I'm chronically lonely, and it's suffering
  • To make a positive difference and to leave an impact on the world
  • If I die, to leave a legacy to be remembered by
    • I don't want to be forgotten
  • Material abundance and comfort
    • I'd enjoy the pleasures of life I think
  • Prestige, status, honour, and glory
    • Monkey brain go brr....

 

 

E.

Where are you coming from?

  • Context on who you are/what made you the person you are today.

(Oh wow, this one is pretty extensive. This section took me the longest to complete.)

 

I'm an immature brat who hasn't lost their childlike wonder and enthusiasm. I cherish the dreams of childhood ambition and reach out towards a brighter world.

 

I've written a few reflections on different aspects of my person. Some of them that I don't currently disavow:

 

F

How do you spend your time?

  • Work, volunteer, education, etc.
  • Basically, activities that aren't primarily for leisure/pleasure purposes.

Currently (the last week or two):

  • I'm unemployed
    • I quit my job as a web developer at the end of July
    • I expect to start a CS Masters at the end of September
      • This is contingent on VISA approval
  • I spend most of my waking hours consuming long form informational audio (audiobooks, podcasts, audio paper, audio blogs, etc.) in the background.
  • I spend a lot of time discussing whatever I've been nerd sniped by on Twitter or Discord
    • Mostly Twitter
  • I sometimes learn some maths or research some stuff that I've been nerd sniped by
    • For maths, it's currently category theory

 

I have a lot I want to learn about, and so I let my muse and mental stamina dictate what I learn about.

Curiosity driven learning is probably more productive, I guess.

(I also have severe attention challenges, so forcing myself to focus against my muse is probably unproductive.)

 

G.

What do you do for recreation/leisure/pleasure/fun time?

  • What are your hobbies?
  • Chat with people on Twitter (sometimes on Discord)
    • When I'm not devoting most time to whatever I've been nerd sniped by, I reliably spend several hours in my DMs
    • Alternatively, I spend more time in DMs when I get bouts of intense loneliness
  • Binge fiction
    • I spend considerable time discussing Tower of God and generally being involved in the fandom.
    • Right now, it's on hiatus. 
  • Working Out 
  • For better or worse, I've become something of a gym rat

 

I don't have a meatspace social life and haven't had one since I graduated university. I want to change that soon though.

(One of the reasons I'm excited about returning to formal education is to have a meatspace social life again. Returning to school did not change this. )
 

I (still) expect I'd enjoy meatspace human contact.

New Comment
2 comments, sorted by Click to highlight new comments since:

Any particular reason you've linked all those tweets, but blocked general access to them? I'd probably be interested in reading some of those threads just going by the titles.

Oh, it's not clear but this thread was originally written over 6(?) months ago (my Twitter was public back then).

I just updated it today to add the top line.

I have this thread linked in my Twitter bio (and I guess that's where most visitors to it come from), so that's the main use case.

I don't really blog outside LW, so don't have a separate home page.

 

Sorry that you can't see the linked tweets but:

  1. I'm not actively maintaining this thread/haven't updated it for several months before a sudden desire to make it more prominent than I'm focusing on AI safety
  2. This thread's modal reader is someone who saw me on Twitter not someone finding this blog post by browsing LW blog posts

 

I do not plan to make my Twitter account public (I accept follow requests by default, but there are many bad incentives of public Twitter that I enjoy being shielded from).