Thoughts to niplav on lie-detection, truthfwl mechanisms, and wealth-inequality

Emrik; niplav

Emrik

Hi, niplav! You might be wondering, "what's this about?", and you would be right.

You see, I was going to write this to you on Schelling.pt (⛯) in response to our chat about lie-detection and the future of human civilization, but instead I am writing it here. I also considered writing it as an email, and I really liked that idea. But I changed my mind about that too. "Nah", I said. "I'll write it here instead."

Postscript

In the context of trying to make society more honest, I wonder to what extent clever truthfwl mechanisms (re mechanism design) could be important for facilitating the civilizational adoption of / managing the transition to brain-scanning credibility-tech.

What I actually should have written in that sentence is: "I have some maybe-interesting maybe-important things which I felt inspired to say to you, but I don't have a smooth-simple way to make them seem coherently relevant, so I'll just say the things." But instead I tried really hard to find a sentence which would justify why I was talking about the things in the first place, especially since I now decided to write it on LessWrong!

Tangentially relevant background preposterous claims which I will not be defending here:

The magic shop in the middle of everywhere: cheap spells against egregores, ungods, akrasia, and you-know-whos!

Anyway, Vickrey auctions feels to me like an almost magical solvent for many societal problems. It's got more practical applications than assurance contracts, and I'm already excited about those. So I asked Sonnet whether there were more patterns like that where it came from:

👀

Idk what those things are yet, but I smell structure within which I intuit a kernel with which I can generate many such solutions to the problems I care about. If the textbook offers a variety of applications for which there are proofs, the real-life applicability of the kernel likely expands beyond that (once adequately grokked). I suspect I may already be unhappily familiar with problem(s) "Strategyproof Task Scheduling" aims to solve.

Deceptive strategy
Trying to make other people believe false things about your preferences.

The meta-pattern is: If a game has a Nash equilibrium for deceptive strategies, it is sometimes/often feasible to change the game rules/mechanics such that the deception is an automated baked-in part of the game. This can be be win-win-win if eg the baked-in "deception" is now symmetric so you avoid deadweight losses or smth. At the very least, it means I no longer gain anything from doing epistem/-ic/-ological warfare on you.

Like in first-price auctions, I have an incentive to bid just above the second-highest bid, even if I value the item much more than that. And I want to act as if I value the item less than I do, so that you think you can get away with bidding much less than your true value, so that I can get away with bidding just above whatever you think is just above what I'm bidding.

First-price sealed-bid auction
The highest bidder gets the item for the price of the highest bid.
And nobody knows what the other bids are until the end.

Thus Vickrey spake: Just short-circuit that whole Death Note drama by declaring that the highest bidder only has to pay the second-highest bid. (The general move here is something like: infer the limit of what deception can provide the players, and declare that as the automatic outcome of the game.)

Second-price sealed-bid auction (Vickrey auction)
The highest bidder gets the item for the price of the second-highest bid.
And nobody knows what the other bids are until the end.

This spell has Nice Properties 🪄 ✨

I don't have to worry about "overpaying" relative to the competition, so I can just bid my true value (ie indifference price) for the item.
The trade is surplus utility for me, despite having revealed my indifference price. In fact, it gives me the maximum surplus/rent I could have got, assuming the counterfactual is a first-price auction with the same bids.
1. (Granted, the bids would not stay the same if it were a first-price auction, so this is an unsafe simplification.)
The trade is also surplus utility for the seller, as long as the auction starting price is their indifference price.
1. Though they don't net as much as they could have from a first-price auction (assuming same bids).
  1. This is usually fine and good, however, because most things sold are sold by richer people. And rich people usually have lower marginal utility on money, so mechanisms that favor the buyer (still within bounds of what's positive sum) are usually higher net utility.
    1. (Unfortunately, the rules of the game are usually also set by richer people, so Vickrey-ideal markets are rare in the wild.)

Interestingly, in Vickrey, the bids have to be sealed in order for it to work. The incomplete information people have about each others' bids is now causing the incentive to reveal true preferences, whereas before it was making people want to hide them.

Mechanisms for progressive redistribution is good because money-inequality is bad

In games/markets where it's the sellers who tend to be poorer (eg in employment), we may instead want mechanisms slightly biased in favor of seller-surplus. The more general heuristic here is just to aim for progressive redistribution of money, so that when people spend their comparative advantages on what's profitable of them, then that happens to allocate them where they generate the most utility for others too.

Money-inequality breaks the symmetry (dilutes the signal) between [willingness-to-trade] & [surplus utility from trade], so movement in the direction of equality is a step toward making money more incentive-compatible with altruism.

And with extreme inequality (viz the current state of the world), voluntary exchange of money can be a minor catastrophe.

To see this, consider a limiting-case utopian economy where everybody has ~equal money, and financial incentives drive all of them to allocate their work-hours to where people are happiest to receive them.

Now I fly in from space with a bajillion money in their currency, and I just really have a craving for an extravagant pyramid that day. To my surprise, people seem really interested in this green paper stuff I've incidentally got dusting in my luggage. So I strike a deal: they build me a pyramid, and I give them the papers. Seems unambiguously like a win-win, no?

🦊🚀🪐💰🏗️

Only if we ignore opportunity costs. It may indeed be positive-sum for the thousands of workers I employed, but in the world where they didn't spend their time digging dirt, they would have spent that time doing things which were much higher utility for thousands of other people. I'm the only rich person around they can waste their work-hours on, so they couldn't possibly have spent their time any worse if they traded with others.

Emrik

(reminder: I find it easier (and epistemologically healthier) to write to specific people instead of a general crowd.)

Aborted draft response to Robin re combining assurance contracts with lie-detection

I was going to reply to Robin re this tweet, but I have a new workflow that drastically reduces the number of tasks I'm able to complete, and grew too ambitious wrt what I wanted to explain, so I thought it'd be better to at least give you my draft instead of archiving this task permanently.

Anecdote about what makes me honest while nobody's looking

Personally, probably the biggest part of what keeps me aligned with my stated values/intentions while nobody's looking (aka "out-of-distribution"), is that I want to be able to truthfwly say—while people are looking (aka "in-distribution")—that I've been consistent with my stated values while nobody was looking.

When I consider cheating some rule I know nobody can check, I think to myself "but then I'll lose the ability to honestly claim that I haven't cheated, and I really cherish the simplicity and feeling-of-righteousness of that… or smth." ¹

The point is: *You don't need to constantly wear a lie-detector in order to constantly incentivize honest behavior.* Your ability to make believed-or-unbelieved sentences that reference your past, enables incentives from probable-future lie-detection to touch many more contexts than they're explicitly used in. See [spot-checking]() & [indirect reciprocity]().

¹ Fun fact: As a ~13yo kid, I really wanted God to exist just so that they could constantly monitor my behavior, since I'd noticed that I tend to behave better whenever people can see me. God seemed implausible, however, so instead I just imagined my friends looking through the windows. (Especially Sanna, since I cared more about her impression of me.)

Premises

assurance contracts aim at solving the assurance problem

Note to prevent misunderstanding: "Lie detection" does not necessarily mean "let's catch the bad guys!", and I'm much more interested in the aspect of "I can voluntarily use this to utter credible statements!"

To get my perspective, you need at least two premises:

Coordination problems, pluralistic ignorance (common-knowledge-bottleneck), Keynesian culture bubbles ("beauty contests") are civilizational-scale problems with the possibility of technical solutions.
1. Importance is high because even if AI-alignment gets "solved" (in the sense of corrigibility), that may not obviate the need for human thinking on the above problems now. By then it might be too late, because AI-proliferation & speed builds up dependency-debt much faster, and refactoring the system becomes intractable after that.
"Assurance democracy" (a strategy for civilizational organization inspired by assurance contracts, but not adequately summed up by saying "just use assurance contracts!") may solve most of the above problems, if only people could be assured that the contracts they sign with others are actually upheld.
Lie-detection solves the problem that bottlenecks assurance contracts from solving the aforementioned problems.

Assurance contract fictional example

By "assurance contract", I'm trying refer to something like the example below. Lie-detection make contracts like these feasible and cheap to verify.

☐ I will begin doing X at time Tₓ if at least N others from group G sign this contract before expiration date Tₖ.
☐ I agree to have my ongoing compliance randomly spot-checked with a registered Veracity™-device at least once a year, and I will pay a $Y fine if I fail to report that I'm complying.
(All signees are required to have their intentions confirmed using a verified Veracity™ during signup.)
(Sunset: 5 years at a time.)

In an "assurance democracy", anybody can define the parameters of their contracts as they wish, and people can self-organize into whatever laws they want.

Evil dictators win by preventing population from organizing against them; a central platform for cheaply generating common-knowledge-assurance and atomic-commit coordinated action against them prevents this. (not optimizing sentences atm...)

Another fictional example from a time I wanted to organize a coworking event on EAGT (except I never actually launched this, so the numbers are fictional):

Let's you set up multiple types of events to check for what sort of thing they wish to commit to; and people don't have to worry about committing to something that doesn't work out, since it only happens if enough people commit.

Assurance contracts against pluralistic ignorance (Keynesian culture-bubbles)

Lie-detection + assurance contracts give ppl is the ability to cheaply generate common knowledge. Required for coordinating on stuff, ∀stuff.

public goods bottlenecks

Pop bubbles of pluralistic ignorance (aka Keynesian culture-bubbles (wherein everyone knows everyone's pretending, but still enforce the norm bc failure-to-pretend is enforced against-‒"and merry-around we go!") (see also "simulacrum levels").

[Smart and Clever thoughts on indirect reciprocity and plausible deniability should go here; but if I just mention words like that, I may activate concepts in your head that could generate the relevant patterns.]

(Not super-relevant to saving the world, but just really neat: assurance contracts allow you to refactor language with atomic-commit pre-agreed-upon updates.)

Assurance democracy obviates the need for VCG, because the "truthfwl mechanism" is just a device you can strap on

I've been calling this sort of stuff an "assurance democracy", but please find a better name for it. Consider it an alternative or supplement to futarchy if you will.

Re fertility & assurance democracy

And wrt to Robin's last question "How for example would they increase fertility?", I don't really have detailed thoughts, and didn't think it was priority to generate them on-the-spot.

My hope is that we can globally coordinate (via assurance) to regulate fertility (aka self-copying), and commit to disincentivize defectors, because that seems the obvious choice once you have the ability to globally coordinate on anything. If a ceiling is assured, cultures don't have to sacrifice values in the competition to be the most self-copying.

niplav

The meta-pattern is: If a game has a Nash equilibrium for deceptive strategies, it is sometimes/often feasible to change the game rules/mechanics such that the deception is an automated baked-in part of the game. This can be be win-win-win if eg the baked-in "deception" is now symmetric so you avoid deadweight losses or smth. At the very least, it means I no longer gain anything from doing epistem/-ic/-ological warfare on you.

Yep, your intuition is completely correct! In mechanism design this is called the revelation principle.

Interestingly, in Vickrey, the bids have to be sealed in order for it to work. The incomplete information people have about each others' bids is now causing the incentive to reveal true preferences, whereas before it was making people want to hide them.

On my last meditation retreat I spent a while thinking about this in the context of *salary negotiations*. Current SOTA of salary negotiations appears to me to be highly suboptimal, favoring disagreeableness, deception, time-wasting &c.

I want to add this to https://niplav.site/services.html as a new pricing scheme, but the short version is very similar to a sealed-bid auction, but symmetric in this case:

The buyer pre-registers (e.g. via hashsum) their maximum willingness-to-pay , the seller pre-registers their mimimum willingness-to-be-paid $p_{s}$ . If $p_{b} > p_{s}$ , the deal takes place, the actual price is $p_{b} + (p_{b} - p_{s}) / 2$ (i.e. splitting the difference). If $p_{b} < p_{s}$ , no deal takes place.

This is fun because the numbers can be negative, and it allows for price discrimination, the best kind of discrimination.

Now I fly in from space with a bajillion money in their currency, and I just really have a craving for an extravagant pyramid that day. To my surprise, people seem really interested in this green paper stuff I've incidentally got dusting in my luggage. So I strike a deal: they build me a pyramid, and I give them the papers.
Seems unambiguously like a win-win, no?

Hah, I've defended this as a bad thing in other (real-life) discussions as "it's better if billionaires buy old paintings as opposed to yachts". Never considered it as an argument against wealth-inequality, but it's a good one.

My inner libertarian speaks out: "Why don't all the other people ignore the bajillion green papers in the luggage and instead shift to an equilibrium with the new currency? After all, *they have all the labor-power*."

I think redistribution is good, probably even above a citizen's dividend from LVT, but I want to keep redistribution out of mechanisms—it's a cleaner and more modular design, so that we don't have to futz with our complicated mechanisms when adjusting the redistribution parameers.

The point is: *You don't need to constantly wear a lie-detector in order to constantly incentivize honest behavior.* Your ability to make believed-or-unbelieved sentences that reference your past, enables incentives from probable-future lie-detection to touch many more contexts than they're explicitly used in. See [spot-checking]() & [indirect reciprocity]().

One thing that makes me a bit less optimistic about lie detectors than you is that I think that most/~all? people don't have internal representations of over-time-stable beliefs about the world (or, god forbid, their own actions—see procrastination paradoxes).
(Predictions about your own actions will kind of diagonalize against your actions, which is why we need a formalization of Steam at some point.)

I think that it'll work well in contexts where the commitment is short-term, and relatively small-scale, and something the lie-detectee has experience with.

I don't think I understand your premises, but they seem fascinating. Will ponder.

…
…
…

Ah, I think now I get it! Lie detection allows for credible commitments of the
form "I will uphold this assurance contract, even if the respective infrastructure is unstable/under coalitional attack". Is that right?

I think there's (there was?) people working on a dominant-assurance contract website(s). I wonder what happened with those.

Plausible deniability indeed invokes ~*~concepts~*~ in my head, it's the kind of thing that for me is always at the boundary of formalizability. Slippery bastard :-)

(Not super-relevant to saving the world, but just really neat: assurance contracts allow you to refactor language with atomic-commit pre-agreed-upon updates.)

…what? I don't understand this ^^

> I've been calling this sort of stuff an "assurance democracy", but please find a better name for it. Consider it an alternative or supplement to futarchy if you will.

I don't know whether I'm misunderstanding here, but with a setup like "In an
"assurance democracy", anybody can define the parameters of their contracts as they wish, and people can self-organize into whatever laws they want", doesn't one get major problems with negotiations between incompatible/conflicting assurance contracts? Similar how in polycentric law all legal systems need to negotiate between each other, requiring $O (n ²)$ negotiations, as opposed to 0 in a centralized system.

And wrt to Robin's last question "How for example would they increase fertility?", I don't really have detailed thoughts, and didn't think it was priority to generate them on-the-spot.

My shoulder-Robin answers as a response to your post: "We can, if polycentric assurance democracy works out, have a mechanism for generating and then perpetuating many different cultures with different norms. If these cultures respect each others' boundaries (unlike it is the case today), these many cultures can then compete with each other as for how appealing they are to people, and also at how good they are at producing more humans. The problem is whether children can be grandfathered into an assurance contract—if they can, then a culture can remain stable and perpetuate itself and its fertility norms, if they *can't*, then we have the same problem as in the current world, where low-fertility cultures are more appealing and offspring of high-fertility cultures move over to low-fertility cultures/assurance-contract-clusters."

[-]habryka2y51

Hmm, this sure is a kind of weird edge-case of the coauthor system for dialogues.

I do think there really should be some indication that niplav hasn't actually responded here, but also, I don't want to just remove them and via that remove their ability to respond in the dialogue. I'll leave it as is for now and hope this comment is enough to clear things up, but if people are confused I might temporarily remove niplav.

[-]Emrik2y10

I now sent the following message to niplav, asking them if they wanted me to take the dialogue down and republish as shortform. I am slightly embarrassed about not having considered that it's somewhat inconvenient to receive one of these dialogue-things without warning.

I just didn't think through that dialogue-post thing at all. Obviously it will show up on your profile-wall (I didn't think about that), and that has lots of reputational repercussions and such (which matter!). I wasn't simulating your perspective at all in the decision to publish it in the way I did. I just operated on heuristics like:

"it's good to have personal convo in public"
- so our younglings don't grow into an environment of pluralistic ignorance, thinking they are the only ones with personality
"it's epistemically healthy to address one's writing to someone-in-particular"
- eg bc I'm less likely to slip into professionalism mode
- and bc that someone-in-particular (𖨆) is less likely to be impressed by fake proxies for good reasoning like how much work I seem to have put in, how mathy I sound, how confident I seem, how few errors I make, how aware-of-existing-research I seem, ...
- and bc 𖨆 already knows me, it's difficult to pretend I know more than I do
  - eg if I write abt Singular Learning Theory to the faceless crowd, I could easily convince some of them that I like totally knew what I was talking about; but when I talk to you, you already know something abt my skill-level, so you'd be able to smell my attempted fakery a mile away
"other readers benefit more (on some dimensions) from reading something which was addressed to 𖨆, because
- "It is as if there existed, for what seems like millennia, tracing back to the very origins of mathematics and of other arts and sciences, a sort of “conspiracy of silence” surrounding [the] “unspeakable labors” which precede the birth of each new idea, both big and small…"
  — Alexander Grothendieck

The Anarchist Abstractionist — Who was Alexander Grothendieck?

---

If you prefer, I'll move the post into a shortform preceded by:

[This started as something I wanted to send to niplav, but then I realized I wanted to share these ideas with more people. So I wrote it with the intention of publishing it, while keeping the style and content mostly as if I had purely addressed it to them alone.]

I feel somewhat embarrassed about having posted it as a dialogue without thinking it through, and this embarrassment exactly cancels out my disinclination against unpublishing it, so I'm neutral wrt moving it to shortform. Let me know! ^^

P.S. No hurry.

[-]niplav2y83

In this particular instance, I'm completely fine this happening—because I trust & like you :-)

In general, this move is probably too much for the other party, unless they give consent. But as I said, I'm fine/happy with being addressed in a dialogue—and LessWrong is better for this than schelling.pt, especially for the longer convos we tend to have. Who knows, maybe I'll even find time to respond in a non-vacuous manner & we can have a long-term back & forth in this dialogue.

[-]Raemon2y31

Quick mod note – this post seems like a pretty earnest, well intentioned version of "address a dialogue to someone who hasn't opted into it". But, it's the sort of thing I'd expect to often be kind of annoying. I haven't chatted with other mods yet about whether we want to allow this sort of thing longterm, but, flagging that we're tracking it as an edge case to think about.

[-]Emrik2y30

Just to ward of misunderstanding and/or possible feelings of todo-list-overflow: I don't expect you to engage or write a serious reply or anything; I mostly just prefer writing in public to people-in-particular, rather than writing to the faceless crowd. Treat it as if I wrote a Schelling.pt outgabbling in response to a comment; it just happens to be on LW. If I'm breaking etiquette or causing miffedness for Complex Social Reasons (which are often very valid reasons to have, just to be clear) then lmk! : )

[-][anonymous]2y20

Wait, so is this a dialogue or not? It's certainly styled as one, and has two LW users as authors, but... only one of them has written anything (yet), and seemingly doesn't expect the other one to engage? Wouldn't this have been better off as a post or a shortform or something like that?

I wanted to leave Niplav the option of replying at correspondence-pace at some point if they felt like it. I also wanted to say these things in public, to expose more people to the ideas, but without optimizing my phrasing/formatting for general-audience consumption.

I usually think people think better if they generally aim their thoughts at one person at a time. People lose their brains and get eaten by language games if their intellectual output is consistently too impersonal.

Also, I think if I were somebody else, I would appreciate me for sharing a message which I₁ mainly intended for Niplav, as long as I₂ managed to learn something interesting from it. So if I₁ think it's positive for me₂ to write the post, I₁ think I₁ should go ahead. But I'll readjust if anybody says they dislike it. : )

[-][anonymous]2y10

Alright, that's cool. I'm not actually bothered by it, and in any case I don't intend to exert social pressure to enforce any kind of norm about this that mods might disagree with.

LESSWRONG
LW