I've only read a couple of pieces on corrigibility but curious why the following scheme wouldn't work/ I haven't seen any work in this direction (from my admittedly v brief scan).
Suppose for every time t some decision maker would choose a probability p(t) with which the AI would shutdown. Further, suppose the AI's utility function in any period was initially U(t). Now scale the new AI's utility function to be U(t)/(1-p(t))- I think this can be quite easily generalized so future periods of utility are likewise unaffected by an increase in the risk of shutdown (eg. scale them by the product of (1-p(t)) over all intervening time periods).
In this world the AI should be indifferent over changes to p(t) (as long as it gets arbitrarily close to 1 but never reaches it) and so should take actions trying to maximize U whilst being indifferent to if humans decide to shut it down.
I don't really understand the concern here. It seems from Google's perspective there was basically a near certainty that the NYT/media/some people on twitter would have some complaints about Gemini regardless of the model outputs. Their options were either to have the complaints be about i) the model apparently offering advice on 'dangerous' topics/perpetuating biases which both seem conducive to requests for more regulation or ii) to cause some outrage/counter-outrage on 'wokeness' which may drive some clicks but is unlikely to cause any serious policymaker to want to regulate and when it quietly gets fixed a couple weeks later have caused basically no actual economic/other harms or costs. Further with ii) you get some evidence/talking points for all the many times in the upcoming years people push for regulation forcing AI models to censor certain outputs/ anti-bias/discrimination proposals. It seems to be that ii) is pretty strongly preferable to the point one would want to basically be guaranteeing the model errs on ii) rather than i) when first releasing it.
Enjoyed reading this (although still making my way through the latter parts), I think it's useful to have a lot of these ideas more formalised.
Wondering if this simple idea would come up against theorem 2/ any others or why it wouldn't work.
Suppose for every time t some decision maker would choose a probability p(t) with which the AI would shutdown. Further, suppose the AI's utility function in any period was initially U(t). Now scale the new AI's utility function to be U(t)/(1-p(t))- I think this can be quite easily generalised so future periods of utility are likewise unaffected by an increase in the risk of shutdown.
In this world the AI should be indifferent over changes to p(t) (as long as it gets arbitrarily close to 1 but never reaches it) and so should take actions trying to maximise U whilst being indifferent to if humans decide to shut it down.
Great read; just one thing I'd add re the housing discussion which I think is sometimes neglected is that unlike essentially all other assets everyone is naturally short housing as they're always going to need somewhere to live. As such buying a similar amount of housing as one plans to use in the future (this can also be done via investing disproportionately into housebuilding stocks or buying shares of various properties) can reduce your risk to future price increases (which historically have been large, especially if you value living in cities/desirable locations).
Housing could therefore be a sensible investment even if you thought an index fund would yield slightly higher returns. This effect is further heightened by the fact most individuals have extremely limited access to leverage unless buying a house and the tax advantages associated with a mortgage too.
Regarding Ozzie's point above about markets where agents can pay to acquire costly information; this is the famous Grossman-Stiglitz paradox (1980), and indeed the paper concludes that informationally efficient markets are impossible in this setting.
Just adding some additional context that might be useful. PredictIt is a similar election betting platform but has a cap on the maximum amount traders are able to bet (I think <$1k, so relatively low). This means that if Polymarket is a money-weighted information aggregation mechanism, PredictIt is a person-weighted information aggregation mechanism. As noted in the post, from 6th October to just now Trump has gone from 50.8 to 60.1 meaning a difference of 1.6 cents with Kamala to a 20.2 cent difference (18.6 cent swing). On predictit he's gone from 51:53 to 55:48 on the same interval (9 cent swing). Of course it's possible that people on PredictIt are changing their bets due to Polymarket prices (eg. by arbitraging), but there's some evidence that at least half the change isn't due to these large bettors (whether those large bettors are trading based on private information, for manipulation purposes, or other reasons).