Jessica Taylor. CS undergrad and Master's at Stanford; former research fellow at MIRI.
I work on decision theory, social epistemology, strategy, naturalized agency, mathematical foundations, decentralized networking systems and applications, theory of mind, and functional programming languages.
Blog: unstableontology.com
Twitter: https://twitter.com/jessi_cata
I don't understand the notation; it looks like bra/ket except not quite?
Most of the alignment problem in this case would be getting to stratified utopia. If stratified utopia is going to be established, then there can be additional trades on top, though they have to be restricted so as to maintain stratification.
With current models, a big issue is, how to construe their preferences? Given they're stateless it's unclear how they could know others are assisting them. I guess they could do web search and find it in context? Future models could be trained to "know" things but they wouldn't be the same model.
And also, would they be motivated to hold up their end of the bargain? It seems like that would require something like interpretability, which would also be relevant to construing their preferences in the first place. But if they can be interpreted to this degree, more direct alignment might be feasible.
Like, there are multiple regimes imaginable:
And trade is most relevant in 2. However I'm not sure why 2 would be likely.
Roko's basilisk is the optimistic hypothesis that making binding agreements with non-existent superintelligences is possible. If Roko's basilisk works, then "trade" with superintelligences can be effective; by making a deal with a superintelligence, you can increase its likelihood of existing, in return for it holding its end of the bargain, increasing the satisfaction of your values.
This probably doesn't work. But if it did work, it would be a promising research avenue for alignment. (Whether it's good to say that it works is probably dominated by whether it's true that it works, and I'm guessing no.)
I think if M isn't "really mental", like there is no world representation, it shouldn't be included in M. I'm guessing depending on the method of encryption, keys might be checkable. If they are not checkable there's a pigeonhole argument that almost all (short) keys would decrypt to noise. Idk if it's possible to "encrypt two minds at once" intentionally with homomorphic encryption.
And yeah, if there isn't a list of minds in R, then it's hard for g to be efficiently computable, as it would be a search. That's part of what makes homomorphically encrypted consciousness paradoxical, and what makes possibility C worth considering.
Regarding subjective existence of subjective states: I think if you codify subjective states then you can ask questions like "which subjective states believe other subjective states exist?". Since it is a belief similar to other beliefs.
See paragraph at the end on the trivialism objection to functionalism
I believe bra/ket is for row and column vectors. I don't think it applies here, because in the general case (semiadditive categories), you have arbitrary linear maps as the hj,i entries. And in the Rm→Rn case, they're reals, not row or column vectors.
It is true that you can decompose as either ⟨[…]…[…]⟩ or [⟨…⟩…⟨…⟩]. To be clear I'm using ⟨⟩ and [] from category theory product/coproduct notation, it's not meant to match linear algebra or bra/ket notation.