Clarifying Your Principles

Raemon

Following up on:

It's easy to end up in situations where people don't know which principles you stand for. (And, which ones you'll really stand for when it's inconvenient).

It's easy to end up in situations where you don't know what principles you stand for, and which ones you'll actually stand up for when it's inconvenient.

You can get away without knowing exactly what your principles are, for awhile. But if you want others to trust you in high stakes situations, it's helpful if you've proactively made your principles clear, and demonstrated that you can live by them. Most people don't really stand up for principles when it's inconvenient, so your prior should be that you probably won't, either.

Being clear about your principles requires you to have them in the first place, and to practice the muscle of standing up for them, so that you know they are real and not just a vague applause light you are professing.

Integrity Debt

In software development, sometimes you need to write messy code that will cause problems for you later on down the line. Writing "good code" would take too long, and you need to ship your product. But, eventually, this messiness is going to make it harder for you to make progress on the codebase, and you'll want to spend some time simplifying your code to "pay down the debt."

Similarly, if you're starting a new organization that depends on trust (either with the public, or particular stakeholders), there's a bunch of actions you might take to build that trust... which you may not have time to do when you're getting started.

Integrity debt accumulates most acutely when you make controversial judgment calls that are hard to explain. Or, judgments that you reflectively wouldn't endorse. I think it also accumulates in small ways, when you do non-controversial but nonetheless confusing things.

But perhaps more importantly (if subtly), integrity debt accumulates when you take on responsibilities that require you to have principles, without yet knowing what those principles are. This may work initially, but eventually will be like building a castle on a foundation of sand. Sooner or later you need to figure out the principles underlying you, your project, or your organization.

If you've undertaken responsibilities without understanding the principles that will guide you in edge-cases, you may find that people have lost trust in you. Or, you may find you have lost trust in yourself.

You may need to pay down your integrity debt. Or, alternately – declare bankruptcy, and transition into strategies that don't depend on trust. Which of these is the right strategy depends on your situation.

Disclaimers:

1. There is a moral element to this, but a lot of my motivation here is figuring out "how do we improve coordination in a low-trust world?". People can disagree on what is morally commendable, but improve their ability to coordinate on net-improvements, even with people they disapprove of.

2. This article jumps back and forth between talking about Integrity, Trust, Accountability and Transparency. These are all different things, and I think you can have each one without the others. But I think they naturally fit together in particular ways that form some obvious strategies.

I'm using 'integrity' and 'accountability' in ways similar to habryka. A quick recap:

When I say “integrity” I mean something like “acting in accordance with your stated beliefs.” Where honesty is the commitment to not speak direct falsehoods, integrity is the commitment to speak truths that actually ring true to yourself, not ones that are just abstractly defensible to other people. It is also a commitment to act on the truths that you do believe, and to communicate to others what your true beliefs are. [...]
The purpose of accountability is to ensure that you do what you say you are going to do, and integrity is the corresponding virtue of holding up well under high levels of accountability. [...]
There is tradeoff between the size of the group that you are being held accountable by, and the complexity of the ethical principles you can act under. Too large of an audience, and you will be held accountable by the lowest common denominator of your values, which will rarely align well with what you actually think is moral (if you've done any kind of real reflection on moral principles).

(Also, Benquo notes: there are multiple things you might mean by integrity. Be careful of conflating "having a strong moral compass" with "conforming to externally imposed authority or social pressure". Depending on your own internal development, this article might or might not be the right advice for you to be considering.)

Dimensions of trust

Not all organizations depend on trust. Sometimes an organization produces an output that can just be directly evaluated. If you build widgets and the widgets obviously work, trust is mostly irrelevant.

But often, an organization depends on some kind of buy-in. If you build custom widgets that nobody else builds, I might hesitate to switch to using your widgets if I don't trust that your organization will keep producing the same widgets for a long time, or leverage your monopoly of them to screw me over.

Or, if you are a communication platform, I might only use you to host my content if I'm not worried about the platform acting in ways I disapprove of.

Trust has many dimensions. A few (non exhaustive) examples:

Trust that you have generally good intentions
Trust that your intentions are specifically aligned with mine
Trust that you are competent enough execute on your project
Trust that you will make the right calls in tough edge cases
Trust that you have consistent policies, so that people can build a model of what the rules are, or what your organization does, or what sort of judgment calls you tend to make. Stability can be important in its own right.

These are each made a bit more complicated by the fact that some organizations have multiple decision-makers, who may have different models and preferred strategies. It's not enough for someone to trust any individual person, they need to trust the collective decision making of the group.

Clarifying your principles publicly won't necessarily mean everyone will agree with your principles, but it's helpful for people modeling you clearly, so they can decide for themselves.

Stakeholders might include...

Members of your team
Your userbase
Allies in whatever ecosystem you're operating in
Key people that you particularly trust to hold you accountable
Yourself

Actions relevant to integrity and trust may include...

Internal Alignment: Credibly Trusting Yourself

If you don't trust yourself to stick to your principles, it's silly to expect others to do so.

Some suggestions:

Actually think about your goals and principles. Do you even know what you're doing and why? You can't act in accord with principles if you don't have them.

Talk with teammates. Get on the same page about your models and reasoning with your fellow decision-makers.

Write principles up privately. Sometimes, writing things publicly creates situations where you're worried about what people will think of you, and this makes it harder to think. But, at least writing things up privately lets you hold yourself accountable.

Building skills and resources. Integrity requires skills like social courage, resilience and conscientiousness. I think a lot of these are like muscles that get stronger with use. (They also are entangled with limited budgets of weirdness points. The true underlying reality is a bit complicated and includes both models)

Critique-ability and External Trust

If you rely on others trusting you, it can be helpful to...

Write up reasoning publicly. Public reasons allow more people to give you feedback if you'd made a mistake. It's also hard to get real public trust if you haven't shared your reasoning.

Act in ways aligned with your principles. If your actions align with your private reasoning, people can notice that you behave consistently and (slowly) build a model of you. If your actions align with your public writing, people can more explicitly validate that your reasoning and actions make sense.

Acting in ways that demonstrate competence. (technical, epistemic, courage, etc)

In general, actions signal louder than words. If you've never made a tough call or important tradeoff, people don't know how good you are at making those calls. If you've never made a tough call, you don't know if you're good at tough calls.

You can write up your principles in advance such that people can predict how you will make tough calls, but people may reasonably predict "Their reasoning says X, but their incentives and outside view for organizations in this reference class say Y, and I'm going to go with predicting 'Y'".

Every time you visible act against your incentives to live by your principles, you make it easier for others to trust you.

Changing Principles Loudly

Jessicata notes:

We can distinguish two things that both fall under what you're calling integrity:
Having one's current stated principles accord with one's current behavior.
Maintaining the same stated principles over time.
It seems to me that, while (1) is generally virtuous, (2) is only selectively virtuous. I generally don't mind people abandoning their principles if they publicly say "well, I tried following these principles, and it didn't work / I stopped wanting to / I changed my mind about what principles are good / whatever, so I'm not following these anymore" (e.g. on Twitter). This can be quite useful to people who are tracking how possible it is to follow different principles given the social environment, including people considering adopting principles themselves. Unfortunately, principles are almost always abandoned silently.

This seems like a good frame to me. I think making public declarations when you change your principles is an important part of managing your integrity credit.

Renouncing claims of authority, or accountability

One of the things that seems worst to me is when an organization claims, or operates, as if it's a trusted, accountable institution... or that it upholds particular principles...

...and then, whelp, it turns out that it behaves as if it did not have those principles or was not accountable to those people.

A slight variation on this is when an organization doesn't make that claim, but sort of operates in a position where people treat it like it's the natural, schelling place to defer to... despite the organization not actually putting sufficient focus to be credibly good at that job. Or, when people sort of assume the organization has a particular principle (which the organization never claimed to), but where the org still sort of reaps the benefit of people believing that fact.

Transparency, accountability and integrity are work

People looking at an org from the outside often ask for transparency, or accountability. I think they often vastly underestimate how much work this entails, and how much it eats away at the organization actually getting its main job done (by order(s) of magnitude).

(Relatedly: The general public also have unrealistic expectations about how much they can demand.)

So... I think it's pretty important that one way of dealing with Integrity Debt is, instead of paying, declare bankruptcy. "You know what, nope. We are not able to promise the principles / reliability / trustworthiness that we'd hoped to. This concretely means you cannot rely on us the way you might have been. We apologize."

If people are trusting you in ways that you didn't promise, it might still be useful to do something like this. (This kinda sucks and is unfair, but I've found it a useful life skill to notice when people are treating me as if I've made some kind of implicit contract with them, and say "actually, sorry, no, I am not opting into that contract, you cannot rely on me in this way.")

Examples

Clarifying principles and maintaining integrity comes up for me in a few major contexts: 1) My personal life, and 2) working on various rationality community infrastructure projects (most notably working on the LessWrong team, but also things like Solstice and the REACH Panel)

LessWrong Team

For the first couple years of the LessWrong 2.0 team's existence, we were running on integrity credit. We had taken on a lot of responsibility, with some complex decisions on how to manage tradeoffs in moderation and the overall site design. Team members often had different takes on how to make those tradeoffs.

Over time we wrote up our thoughts, which both forced us to get on the same page, and to think through some difficult edge cases. Some examples here:

Habryka's Models of Moderation, his later Integrity and accountability are core parts of rationality.
Ruby's Speaking for Myself (and subsequent discussion about "When is speaking for yourself a good idea?")
My post Meta-tations on Moderation: Towards Public Archipelago, as well as following up with some problems with archipelago in practice, where I'm in the process of potentially changing my mind. (Note: Both of those posts are a bit old, and the LessWrong team is currently taking stock of how our moderation practices and site-tools have played out)

Confidentiality

A few years ago I was not very good at keeping secrets – I'd sometimes just blurt things out without thinking about it.

I eventually decided on the principle that I don't think people should automatically assume everyone can easily keep secrets. I wrote up a public post about it, and made a habit of proactively having a meta-conversation about it when someone seemed to be wanting me to keep things confidential.

The most confusing piece here was implicit bids for confidentiality, from people who I was afraid were manipulating me, or harming others. This eventually became the Privacy and Manipulation blogpost.

Today I'm both better at keeping secrets, and after some hard-earned lessons I'm also a lot more resistant to manipulation. These increased skills mean I feel less obligated to have proactive, awkward meta-conversations. But people still vary a lot in how skilled they are, and I think erring on the side of slightly-awkward meta conversations is still pretty good for people who are still upskilling.

Friendship

It's easy to end up with a lot of "ambiguous friends", where it's not quite clear how close you are, and how much you'd prioritize each other when times are tough. And among my closer friends, it was also unclear what things we actually deeply valued about the relationship.

Over the past few years I've started treating friendship a bit more like dating. I've thought through what I want out of close friends, and when I meet new people I might be interested in befriending, I after the "third date" or so I start mentioning what I'm interested in for longterm close friendship, so we can start setting expectations and figure out whether this more like a casual friendship or on the "close friendship escalator." (This is still a bit of an experiment and I'm not sure how well it's gone)

Recap

To summarize everything:

If you want people to trust you – as a friend, as a community organizer, or as a major organization – it's valuable for them to know what your guiding principles are. This requires you to actually know what your guiding principles are.

Figuring out what your principles is hard work, and harder work if you're in a multi-person organization where people disagree. It may not be worth it in all cases. But I think it's worth considering, and if you decide not to put in the work, it's important to realize this may result in people not trusting you, or feeling betrayed.

When you take on responsibility without knowing what principles you'll actually stand by, I think you're sort of borrowing integrity "on credit", and sooner or later you may want to pay down your tab.

This post is downstream of ideas I gained from Andrew Critch, Duncan Sabien, and Oliver Habryka, and benefitted from a lot of discussion with Elizabeth Van Nostrand (Though none of them necessarily endorse this essay).

[-]MSRayne2y30

I appreciate the argument for clarifying principles, but I'm still not quite sure exactly how you think it's best to find out what they are, or to write a principle-declaration statement. What all goes into such a thing? Are there lists of principles to pick and choose from? Or a pattern language for building your own?

[-]Raemon2y20

Good question! This is probably several blogposts. A rough answer, just saying how I personally ended up with principles:

I think the main thing was hanging out with people with principles, and arguing with people-with-principles on the internet. This

a) gave me some object level ideas on principle-directions that might be important (i.e. them arguing for principles I decided were good),

b) gave anti-examples where they argued for things that seemed off/wrong to me, but which got me thinking through complex domains that I realized I'd need to come up with my own principles to navigate.

c) gave me a general flavor of "what is it like to have principles, generally", in ways that generalized.

Concretely, at the LessWrong team we read through Ray Dalio's Principles, as sort of a meditation on "what it's like to have Principles".

It definitely mattered that I work in domains where having principles matters, and comes up often enough to provide some feedback loop.

[-]M. Y. Zuo2y30

This is really interesting way of looking at things. I personally have noticed that a lot of moral commendation and condemnation was dependent on in-group/out-group dynamics which themselves, ultimately, were dependent on levels of trust, both ambient and explicit.

In real life the issue can be partly sidestepped because we can usually tell if someone is trustworthy to a rough order of magnitude, i.e. filtering out maniacs and so on, via a lot of non-verbal cues, which are not possible online.

[-]Dagon2y2-2

Upvoted, and mostly agreed. I think this undervalues some individuals’ and cultures’ unstated or unexamined principles. Fundamentally, trust is about predictability of cooperation. In the modern world, at scale, this is most easily achieved by explicit legible statements of priority (what’s most important to you or your organization). But it’s not the only way.

[-]Raemon2y40

Yeah agreed. But I suspect that this is still fairly good advice for the sort of person likely to actually end up reading this post (for a variety of reasons).

[-]Ben Schwyn2y10

I think this post could use some more distinguishment between when it's talking about individual Integrity and organizational integrity. That was somewhat confusing to me on reading and I was wondering if you were suggesting they operated in the same way. Or if you were suggesting that then it could be stated directly.

[-]Raemon2y30

I personally think of them in roughly the same frame. Organizational integrity requires more steps since you have to figure out how to get your organization to even be a coherent-entity in the first place, but it's still basically the same loop in my experience. (I think I wrote this post with the target audience of organizations primarily in mind, because organizations tend to wield disproportionate power. But organizations are made of people and I'd want the individual people to be clarifying their principles in addition to figuring out how to have coherent principles as a group).

I think this post is mostly motivated by interpersonal-coordination-principles, such as honesty, keeping promises, [and not making promises you can't keep], and repaying your debts. This isn't the only kind of principle – there are also principles of aesthetics and craftsmanship and practical rules you follow. But confused/bad coordination principles are more likely to become other people's problem.

[-]JacobW382y10

I appreciated this post a lot. I practice a rigorous mental modification system that operates on a narrow set of principles that I essentially need to uphold in every situation without respite, so I'm closely familiar with the subject matter, and the way you expressed it rings true to me. The more important and pervasive a given principle is to you, the more necessary it is to have an unequivocally clear formalization of it. That way, you know exactly what you're following with no wiggle room, and if anyone asks, you know exactly what to tell them. I can't understate the value of making those principles public for the purpose of seeking feedback as well; there could always be ways of refining those principles even further that you just haven't thought of.

[-]Raemon2y50

Thanks!

That said...

I do get some sense that... you might actually be a person who benefits from the opposite of this post's advice? I think most people are underprincipled, but it's also possible to fall into the Too Scrupulous trap. Most principles require some flexibility and interpretation, and I think trying to iron them down too much ahead of time can be wasted motion.

There's also a particular failure mode where Principled People who end up with slightly different principles end up having a hard time coordinating.

I don't know whether any of that applies to you, just some counter-food-for-thought.

The principles I'm alluding to here are purely self-applied, so I don't have to worry about crossing signals with anyone in that regard, but I'll heed your advice in situations where I'm working with aligning my principles with others'. It's also an isolated case where my utility function absolutely necessitates their constant implementation and optimization; generally, I do try to be flexible with ordinary principles that don't have to be quite so unbending.

LESSWRONG
LW

60