Daniel C

Master's student in applied mathematics, funded by Center on Long-Term Risk to investigate the cheating problem in safe pareto-improvements. Agent foundations fellow with @Alex_Altair

Some other areas I'm interested in:

  • Investigate properties of general purpose search so that we can handcraft it & simply retarget the search
  • Investigate the type signature of world models to find properties that remain invariant under ontology shifts
  • Natural latents
    • How to characterize natural latents in settings like PDEs?
    • Equivalence of natural latents under transformation of variables
  • Formalizing automated design
  • Information theoretic impact measures
  • Scalable blockchain consensus mechanisms
  • Programming language for concurrency
  • Quantifying optimization power without assuming a particular utility function
  • What mathematical axioms would emerge in a solomonoff inductor?
  • How things like riemannian metric & differential equations might emerge from discrete systems

Wikitag Contributions

Comments

Sorted by

There's a chicken-and-egg problem here[...] and then using that assumption to prove that markets are causal.

 That argument was more about accomodating "different traders with different beliefs", but here's an independent argument for market being causal:

When I cause a particular effect/outcome, that means I mediate the influence between the cause of my action and the effect/outcome of my action, the cause of my action is conditionally independent of the effect of my action given me

Futarchy is a similar case: There may be many causes that influence market prices, which in turn determines the decision chosen, & market prices mediate the influence between the cause of market prices (e.g. different traders' beliefs) and the decision chosen. Any information can only influence what decision will be chosen through influencing the market prices. This seems like what it means for market to be causal (In a bayesnet, the decision chosen will literally only have market prices as the parent, assuming we commit to using futarchy to choose decisions).

 

 


 

The first expectation needs to be conditioned on the market activating. (That is not conditionally independent of u given d1 in general.)

 

If we commit to using futarchy to choose decision, then market 1 activating will have exactly the same truth conditions as executing d1, so "market activating and d1" would be the exact same thing as "d1" itself (commiting to use futarchy to choose decision means we assign 0 probability to "first market activating & execute d2" or "Second market activating & execute d1")

Different people have different beliefs, so the expectations are different for different traders. You can't write "E" without specifying for which trader.

Yes, we can replace with E_i, and then argue that traders with accurate beliefs will accumulate more money over time, making market estimates more accurate in the limit
 


 

My main objection to this logic is that there doesn't seem to be any reflection of the idea that different traders will have different beliefs.[...] All my logic is based on a setup where different traders have different beliefs.


 Over time, traders who have more accurate beliefs (& act rationally according to those beliefs) will accumulate more money in expectation (& vice versa), so in the limit we can think of futarchy as aggregating the beliefs of different traders weighted by how accurate their beliefs were in the past

So I don't think the condition "p1>E[u|d1]" really makes sense? [...]and this makes it unlikely that the market will converge to E[u|d1].

If I pay p1 for a contract in market 1, my expected payoff is:

 (since I get my money back if d2/market 2 is activated)

this is negative iff  and positive iff 

and if we commit to using futarchy to choose the decision, then  is chosen iff market 1 activates, so E_i[u|d1, market 1 activates] should equal E_i[u|d1]
 

We want you to pay more for a contract for coin A, since that’s the coin you think is more likely to be heads (60% vs 59%). But if you like money, you’ll pay more for a contract on coin B. You’ll do that because other people might figure out if it’s an always-heads coin or an always-tails coin. If it’s always heads, great, they’ll bid up the market, it will activate, and you’ll make money. If it’s always tails, they’ll bid down the market, and you’ll get your money back.

 

Let's call "Bidding on B, hoping that other people will figure out if B is an always-head or always-tails coin" strategy X,  and call "Figure out if B is an always-head or always-tail myself & bid accordingly, or if I can't, bid on A because it's better in expectation" strategy Y.

 

If I believe that sufficient number of people in the market are using strategy Y, then it's beneficial for me to use strategy X, and insofar as my beliefs about the market are accurate, this is okay, because sufficient number of people using strategy Y means the market will actually figure out if B is always-head or always-tail, then bid accordingly. So the market selects the right decision, insofar as my beliefs about the market is correct (Note that I'm never incentivized to place a bid on B so large that it causes B to activate, since I don't actually know if B is always-head).

 

On the other hand, if I believe that the vast majority of people in the market are using strategy X instead of strategy Y, then it's no longer beneficial for me to use strategy X myself, I should instead use strategy Y because the market doesn't actually do the work of finding out if coin B is always-head for me. Other traders who have accurate beliefs about the market will switch to strategy Y as well, until there is a sufficient number of trader to push the market towards the right decision.

So insofar as people have accurate beliefs about the market, the market will end up selecting the right decision (either sufficient number of people use strategy Y, in which case it's robust for me to use strategy X, or not enough people are using strategy Y, in which case people are incentivized to switch to Y)

More generally, what's the argument that the market will always select the decision that leads to he higher expected payout?

"Always" might be too strong, but very informally:

Suppose that we have we have decision d1 d2, with outcome/payoff u & conditional market prices p1 (corresponds to d1) p2 (corresponds to d2)
 

if p1>E[u|d1], then traders are incentivized to sell & drive down p1. Similarly they will be incentivized to bid up p1 if p1<E[u|d1]. So p1 will tend toward E[u|d1]. We can argue similar for p2 tending towards E[u|d2]

Since we choose the decision with the higher price, and prices tend towards the expected payoff given that decision, the market end up choosing the decision that leads to the higher expected payoff.


 

I think I'm claiming 3, namely all we want from futarchy is for it to select the decision with the highest expected payout, and for that the property isn't necessary.

Ex: For the two coin two market case, the first market's price estimates the expected payout if we flip coin one (& similarly for the second market), & while neither market satisfies the property (E[f]=E[z] always), we would still select the decision that leads to the higher expected payout (as we select the higher price), and that's all that's needed

I think that's right. (I guess technically it depends on what version of Futarchy you're trying to use. You could have a single market for a single coin that's flipped iff the final price is above some threshold.)[...] That doesn't fit into my assumptions.

Yep agreed.

But I think you can also pretty easily generalize the proof? [...] just changing f(x,Y,Z) to f(x,Y,Z,C) everywhere?

The proof definitely shows that within a single market (e.g. conditional on y>=c), you would be indifferent to Z given the opposite counterfactual (y<c), but that's okay because (in the two market for two coin case) we have two markets (estimating y and c) and each of the market would respond to each of the two possible counterfactuals (y>=c or y<c).

So although I would be indifferent between  in the first market (conditional on y>=c) when they imply different distribution on Z conditional on y<c, I would not be indifferent between  in the second market (conditional on y<c), and that difference would be reflected in c, which affects whether y>=c or y<c (& therefore which decision will be chosen).

Suppose you run a market where if you pay x and the final market price is y and z happens, then you get a payout of f(x,y,z) dollars. The payout function can be anything, subject only to the constraint that if the final market price is below some constant c, then bets are cancelled, i.e. f(x,y,z)=x for y < c.


But in futarchy the "threshold price" c wouldn't be constant, it would be the price of the market conditional on the scenario y<c.

IIUC the theorem is saying that you would be indifferent to whatever happens to Z if y<c, but that counterfactual would be estimated by another market (which estimates c) that activates when y<c and cancels when y>=c

Great post! Agree with the points raised but would like to add that restricting the expressivity isn’t the only way that we can try to make the world model more interpretable by design. There are many ways that we can decompose a world model into components, and human concepts correspond to some of the components (under a particular decomposition) as opposed to the world model as a whole. We can backpropagate desiderata about ontology identification to the way that the world model is decomposed.

 

For instance, suppose that we’re trying to identify the concept of a strawberry inside a solomonoff inductor: We know that once we identify the concept of a strawberry inside a solomonoff inductor, it needs to continue to work even when the solomonoff inductor updates to new potential hypotheses about the world (e.g. we want the concept of a strawberry to still be there even when the inductor learns about QFT). This means that we’re looking for redundant information that is present in a wide variety of potential likely hypothesis given our observations, so instead of working with all the individual TMs, we can try to capture the redundant information shared across a wide variety of TMs consistent with our existing observations (& we expect the concept of a strawberry to be part of that redundant information, as opposed to the information specific to any particular hypothesis)

 

This obviously doesn’t get us all the way there but I think it’s an existence proof for cutting down the search space for “human-like concepts” without sacrificing the expressivity of the world model, by reasoning about what parts of the world model could correspond to human-like concepts

I think one pattern which needs to hold in the environment in order for subgoal corrigibility to make sense is that the world is modular, but that modularity structure can be broken or changed


For one, modularity is the main thing that enables general purpose search: If we can optimize for a goal by just optimizing for a few instrumental subgoals while ignoring the influence of pretty much everything else, then that reflects some degree of modularity in the problem space

Secondly, if the modularity structure of the environment stays constant no matter what (e.g We can represent it as a fixed causal DAG), then there would be no need to "respect modularity" because any action we take would preserve the modularity of the environment by default (given our assumption); we would only need to worry about side effects if there's at least a possibility for those side effects to break or change the modularity of the problem space, and that means the modularity structure of the problem space is a thing that can be broken or changed


Example of modularity structure of the environment changing: Most objects in the world pretty much only have direct influence on other objects nearby, and we can break or change that modularity structure by moving objects to different positions. In particular, the positions are the variables which determines the modularity of "which objects influence which other objects", and the way that we "break" the modularity structure between the objects is by intervening on those variables.

 

So we know that "subgoal corrigibility" requries the environment to be modular, but that modularity structure can be broken or changed. If this is true, then the modularity structure of the environment can be tracked by a set of "second-order" variables such as position which tells us "what things influence what other things" (In particular, these second-order variables themselves might satisfy some sort of modularity structure that can be changed, and we may have third-order variables that tracks the modularity structure of the second-order variables). The way that we "respect the modularity" of other instrumental subgoals is by preserving these second-order variables that track the modularity structure of the problem space.

 

For instance, we get to break down the goal of baking a cake into instrumental subgoals such as acquiring coca powder (while ignoring most other things) if and only if a particular modularity structure of the problem space holds (e.g. other equipments are all in the right place & right positions), and there is a set of variables that track that modularity structure (the conditions & positions of the equipments). The way we preserve that modularity structure is by preserving those variables (the conditions & positions of the equipments).  

 

Given this, we might want to model the world in a way that explicitly represents variables that track the modularity of other variables, so that we get to preserve influence over those variables (and therefore the modularity structure that GPS relies on)

Load More