LESSWRONG
LW

606
jessicata
10363Ω826729930
Message
Dialogue
Subscribe

Jessica Taylor. CS undergrad and Master's at Stanford; former research fellow at MIRI.

I work on decision theory, social epistemology, strategy, naturalized agency, mathematical foundations, decentralized networking systems and applications, theory of mind, and functional programming languages.

Blog: unstableontology.com

Twitter: https://twitter.com/jessi_cata

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Matrices map between biproducts
jessicata2d20

I believe bra/ket is for row and column vectors. I don't think it applies here, because in the general case (semiadditive categories), you have arbitrary linear maps as the hj,i entries. And in the Rm→Rn case, they're reals, not row or column vectors.

It is true that you can decompose as either ⟨[…]…[…]⟩ or [⟨…⟩…⟨…⟩]. To be clear I'm using ⟨⟩ and [] from category theory product/coproduct notation, it's not meant to match linear algebra or bra/ket notation.

Reply
Matrices map between biproducts
jessicata2d20

I don't understand the notation; it looks like bra/ket except not quite?

Reply
AI Doomers Should Raise Hell
jessicata19d30

Most of the alignment problem in this case would be getting to stratified utopia. If stratified utopia is going to be established, then there can be additional trades on top, though they have to be restricted so as to maintain stratification.

With current models, a big issue is, how to construe their preferences? Given they're stateless it's unclear how they could know others are assisting them. I guess they could do web search and find it in context? Future models could be trained to "know" things but they wouldn't be the same model.

And also, would they be motivated to hold up their end of the bargain? It seems like that would require something like interpretability, which would also be relevant to construing their preferences in the first place. But if they can be interpreted to this degree, more direct alignment might be feasible.

Like, there are multiple regimes imaginable:

  1. Interpretability/alignment infeasible
  2. Partial interpretability/alignment feasible; possible to construe preferences and trade with LLMs
  3. Extensive interpretability/alignment feasible

And trade is most relevant in 2. However I'm not sure why 2 would be likely.

Reply
AI Doomers Should Raise Hell
jessicata19d91

Roko's basilisk is the optimistic hypothesis that making binding agreements with non-existent superintelligences is possible. If Roko's basilisk works, then "trade" with superintelligences can be effective; by making a deal with a superintelligence, you can increase its likelihood of existing, in return for it holding its end of the bargain, increasing the satisfaction of your values.

This probably doesn't work. But if it did work, it would be a promising research avenue for alignment. (Whether it's good to say that it works is probably dominated by whether it's true that it works, and I'm guessing no.)

Reply
Homomorphically encrypted consciousness and its implications
jessicata22d20

I think if M isn't "really mental", like there is no world representation, it shouldn't be included in M. I'm guessing depending on the method of encryption, keys might be checkable. If they are not checkable there's a pigeonhole argument that almost all (short) keys would decrypt to noise. Idk if it's possible to "encrypt two minds at once" intentionally with homomorphic encryption.

And yeah, if there isn't a list of minds in R, then it's hard for g to be efficiently computable, as it would be a search. That's part of what makes homomorphically encrypted consciousness paradoxical, and what makes possibility C worth considering.

Regarding subjective existence of subjective states: I think if you codify subjective states then you can ask questions like "which subjective states believe other subjective states exist?". Since it is a belief similar to other beliefs.

Reply
Homomorphically encrypted consciousness and its implications
jessicata24d20

See paragraph at the end on the trivialism objection to functionalism

Reply
Load More
32Matrices map between biproducts
2d
4
33Homomorphically encrypted consciousness and its implications
26d
38
64A philosophical kernel: biting analytic bullets
3mo
21
33Measuring intelligence and reverse-engineering goals
3mo
10
17Towards plausible moral naturalism
4mo
9
23Generalizing zombie arguments
4mo
9
21The Weighted Perplexity Benchmark: Tokenizer-Normalized Evaluation for Language Model Comparison
Ω
4mo
Ω
0
27Why I am not a Theist
4mo
6
20"Self-Blackmail" and Alternatives
9mo
12
96On Eating the Sun
10mo
98
Load More