LESSWRONG
LW

6
jessicata
10331Ω826719870
Message
Dialogue
Subscribe

Jessica Taylor. CS undergrad and Master's at Stanford; former research fellow at MIRI.

I work on decision theory, social epistemology, strategy, naturalized agency, mathematical foundations, decentralized networking systems and applications, theory of mind, and functional programming languages.

Blog: unstableontology.com

Twitter: https://twitter.com/jessi_cata

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
AI Doomers Should Raise Hell
jessicata12d30

Most of the alignment problem in this case would be getting to stratified utopia. If stratified utopia is going to be established, then there can be additional trades on top, though they have to be restricted so as to maintain stratification.

With current models, a big issue is, how to construe their preferences? Given they're stateless it's unclear how they could know others are assisting them. I guess they could do web search and find it in context? Future models could be trained to "know" things but they wouldn't be the same model.

And also, would they be motivated to hold up their end of the bargain? It seems like that would require something like interpretability, which would also be relevant to construing their preferences in the first place. But if they can be interpreted to this degree, more direct alignment might be feasible.

Like, there are multiple regimes imaginable:

  1. Interpretability/alignment infeasible
  2. Partial interpretability/alignment feasible; possible to construe preferences and trade with LLMs
  3. Extensive interpretability/alignment feasible

And trade is most relevant in 2. However I'm not sure why 2 would be likely.

Reply
AI Doomers Should Raise Hell
jessicata12d91

Roko's basilisk is the optimistic hypothesis that making binding agreements with non-existent superintelligences is possible. If Roko's basilisk works, then "trade" with superintelligences can be effective; by making a deal with a superintelligence, you can increase its likelihood of existing, in return for it holding its end of the bargain, increasing the satisfaction of your values.

This probably doesn't work. But if it did work, it would be a promising research avenue for alignment. (Whether it's good to say that it works is probably dominated by whether it's true that it works, and I'm guessing no.)

Reply
Homomorphically encrypted consciousness and its implications
jessicata15d20

I think if M isn't "really mental", like there is no world representation, it shouldn't be included in M. I'm guessing depending on the method of encryption, keys might be checkable. If they are not checkable there's a pigeonhole argument that almost all (short) keys would decrypt to noise. Idk if it's possible to "encrypt two minds at once" intentionally with homomorphic encryption.

And yeah, if there isn't a list of minds in R, then it's hard for g to be efficiently computable, as it would be a search. That's part of what makes homomorphically encrypted consciousness paradoxical, and what makes possibility C worth considering.

Regarding subjective existence of subjective states: I think if you codify subjective states then you can ask questions like "which subjective states believe other subjective states exist?". Since it is a belief similar to other beliefs.

Reply
Homomorphically encrypted consciousness and its implications
jessicata18d20

See paragraph at the end on the trivialism objection to functionalism

Reply
Homomorphically encrypted consciousness and its implications
jessicata18d20

No but it's complicated. Wrote about speed prior + QM previously here.

Reply
Homomorphically encrypted consciousness and its implications
jessicata18d20

Speed prior type reasons. Like, a basic intuition is "my experiences are being produced somehow, by some process". Speed prior leads to "this process is at least somewhat efficient".

Like, usually if you see a hard computation being done (e.g. mining bitcoin), you would assume it happened somewhere. If one's experiences are produced by some process, and that process is computationally hard, it raises the question "is the computation happening somewhere?"

Reply
Homomorphically encrypted consciousness and its implications
jessicata18d20

Oh, maybe what you are imagining is that it is possible to perceive a homomorphic mind in progress, by encrypting yourself, and feeding intermediate states of that other mind to your own homomorphically encrypted mind. Interesting hypothetical.

I think with respect to "reality" I don't want to be making a dogmatic assumption "physics = reality" so I'm open to the possibility (C) that the computation occurs "in reality" even if not "in physics".

Reply
Homomorphically encrypted consciousness and its implications
jessicata19d40

Right so, by step 4 I'm not trying to assume that h is computationally tractable; the homomorphic case goes to show that it's probably not in general.

With respect to C, perhaps I'm not verbally expressing it that well, but the thing you are thinking of, where there is some omniscient perspective that includes "more than" just the low level of physics (where the "more than" could be certain informational/computational interconnections) would be an instance. Something like, "there is a way to construct an omniscient perspective, it just isn't going to be straightforwardly derivable from the physical state".

Reply
Homomorphically encrypted consciousness and its implications
jessicata19d20

Yeah that seems like a case where non-locality is essential to the computation itself. I'm not sure how the "provably random noise from both" would work though. Like, it is possible to represent some string as the xor of two different strings, each of which are themselves uniformly random. But I don't know how to generalize that to computation in general.

I think some of the non locality is inherited from "no hidden variable theory". Like it might be local in MWI? I'm not sure.

Reply
Homomorphically encrypted consciousness and its implications
jessicata19d30

Hmm... I think with Solomonoff induction I would say R is the UTM input, plus the entire execution trace/trajectory. Then M would be like the agent's observations, which are a simple function of R.

I see that we can't have all "real" things being R-efficiently computable. But the thing about doxastic states is, some agent has access to them, so it seems like from their perspective, they are "effective", being "produced somewhere"... so I infer they are probably "computed in reality" in some sense (although that's not entirely clear). They have access to their beliefs/observations in a more direct way than they have access to probabilities.

With respect to reversibility: The way I was thinking about it was that when the key is erased, it's erased really far away. Then the heat from the key gets distributed somehow. Like the information could even enter a black hole. Then there would be no way to retrieve it. (Shouldn't matter too much anyway if natural supervenience is local, then mental states couldn't be affected by far away physical states anyway)

Reply
Load More
33Homomorphically encrypted consciousness and its implications
19d
38
64A philosophical kernel: biting analytic bullets
3mo
21
33Measuring intelligence and reverse-engineering goals
3mo
10
17Towards plausible moral naturalism
4mo
9
23Generalizing zombie arguments
4mo
9
21The Weighted Perplexity Benchmark: Tokenizer-Normalized Evaluation for Language Model Comparison
Ω
4mo
Ω
0
27Why I am not a Theist
4mo
6
20"Self-Blackmail" and Alternatives
9mo
12
96On Eating the Sun
10mo
98
1252024 in AI predictions
10mo
3
Load More