LESSWRONG
LW

2491
jessicata
10274Ω824709680
Message
Dialogue
Subscribe

Jessica Taylor. CS undergrad and Master's at Stanford; former research fellow at MIRI.

I work on decision theory, social epistemology, strategy, naturalized agency, mathematical foundations, decentralized networking systems and applications, theory of mind, and functional programming languages.

Blog: unstableontology.com

Twitter: https://twitter.com/jessi_cata

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Why's equality in logic less flexible than in category theory?
jessicata8h20

I think with category theory, isomorphism is the obvious equivalence relation on objects in a category, whereas in set theory, which equivalence relation to use depends on context. E.g. we could consider reals as equivalence classes of Cauchy sequences of naturals (equivalent when their difference converges to 0). The equivalence relation here is explicit, it's not like in category theory where it follows from other structures straightforwardly.

Reply
Why's equality in logic less flexible than in category theory?
jessicata1d42

The thing you have said (presence of an isomorphism) is not equality in category theory. Set-theoretic equality is equality in category theory (assuming doing category theory with set-theoretic foundations). Like, we could consider a (small) category as set of objects + set of morphisms + function assigning ordered pair of objects to each morphism.

Rather, what you're talking about is a certain type of equivalence relation (presence of an isomorphism). It doesn't always behave like equality, because it is not equality.

Reply
This is a review of the reviews
jessicata8d20

The comment you were criticizing stated

It seems that you have to really thread the needle to get from "5% p(doom)" to "we must pause, now!". You have to reason such that you are not self-interested but are also a great chauvinist for the human species.

This comment seems more to be resisting political action (pause AI) than pursuing it. If anything, your concern about political actors becoming monsters would more apply to the sort of people who want to create a world government to ban X globally, than people bringing up objections.

Reply
This is a review of the reviews
jessicata9d20

I was assuming conditional on 1 in 20 chance of AI kills everyone

Basically I don't think the anti "coercing others for ideological reasons" argument applies to the sort of person who thinks "well, I don't think a 1 in 20 chance of AI killing everyone is so bad that I'm going to support a political movement trying to ban AI research; for abstract reasons I think AI is still net positive under that assumption"

The action / inaction distinction matters here

Reply
Notes on fatalities from AI takeover
jessicata9d81

Insofar as I buy this argument, I would apply it to my own CEV, hence concluding that on reflection I would most prefer a universe without biological humans. Then I notice I'm probably not special and probably a lot of other people's CEV generates this conclusion too. This seems kind of heretical but I'm not convinced it's wrong.

Reply
This is a review of the reviews
jessicata9d20

This seems too pattern matchy to be valid reasoning? Let's try an exercise where I rewrite the passage:

“They tried to crush us over and over again, but we wouldn’t be crushed. We drove off the AI researchers. We winkled out those who preached that superintelligence would be motivated to be moral, out of the churches and more importantly out of people’s minds. We got rid of the hardware sellers, thieving bastards, getting their dirty fingers in every deal, making every straight thing crooked. We dragged the gamers into the twenty-first century, and that was hard, that was a cruel business, and there were some painful years there, but it had to be done, we had to get the much off our boots. We realised that there were saboteurs and enemies among us, and we caught them, but it drove us mad for a while, and for a while we were seeing enemies and saboteurs everywhere, and hurting people who were brothers, sisters, good friends, honest comrades...

[...] Working for the future made the past tolerable, and therefore the present. [...] So much blood, and only one justification for it. Only one reason it could have been all right to have done such things, and aided their doing: if it had been all prologue, all only the last spasms of the death of the old, unsafe, anti-human world, and the birth of a new safe, humanistic one.”

Aha, I have compared AI regulationists to the Communists, so they lose! Keep in mind that it is not the "accelerationist" position that requires centralized control and the stopping of business-as-usual, it is the "globally stop AI" one.

(But of course the details matter. Sometimes forcing others to pay costs works out net positively for both them and for you...)

Reply1
Emergent morality in AI weakens the Orthogonality Thesis
jessicata1mo84

I've written criticisms of orthogonality: The Obliqueness Thesis, Measuring intelligence and reverse-engineering goals.

While I do think human moral reasoning suggests non-orthogonality, it's a somewhat conceptually tricky case. So recently I've been thinking about more straightforward ways of showing non-orthogonality relative to an architecture.

For example, consider RL agents playing Minecraft. If you want to get agents that beat the game, you could encode this preference function directly as a reward function, reward it when it beats the game. However this fails in practice.

The alternative is reward shaping. Reward it for pursuing instrumental values like exploring or getting new resources. This agent is much more likely to win, despite it being mis-aligned.

What this shows is that reinforcement learning is a non-orthogonal architecture. Some goals (reward functions) lead to more satisfaction of convergent instrumental goals than others.

Slightly tricker case is humans. Direct encoding of inclusive fitness as human neural values seems like it would produce high fitness, but we don't see humans have this, therefore the space evolution is searching over is probably non-orthogonal.

Maybe it's like the RL case where organisms are more likely to have fitness if they have neural encodings of instrumental goals, which are easier to optimize short-term. Fixed-action patterns suggest something like this, there's a "terminal value" of engaging in fixed action patterns (which happen to be ones that promote fitness; evolution searched over many possible fixed action patterns).

So instead of assuming "organisms get more fitness by having values aligned with inclusive fitness" we could re-frame as, "inclusive fitness is a meta-value over organisms (including their values), some values lead to higher inclusive fitness than others, empirically".

This approach could be used to study human morality. Maybe some tendencies to engage in moral reasoning lead to more fitness, even if moral reasoning isn't straightforwardly aligned with fitness. Perhaps because, morality is a convenient proxy that works in bounded rationality.

A thesis would be something like, orthogonality holds for almost no architectures. Relative to an architecture like RL or neural encodings of values, there are almost always "especially smart values" that lead to more convergent instrumental goal achievement. Evolution will tend to find these empirically.

This doesn't contradict that there is some architecture that is orthogonal, which I take to be the steelman of the orthogonality thesis. However it suggests that even if this steelman is true, it has limited applicability to empirically realized agent architectures, and in particular doesn't apply to human preference/morality.

Reply
A philosophical kernel: biting analytic bullets
jessicata1mo20

it doesn't seem highly problematic that we can access mathematical facts that "live partially outside the universe" via "reasoning" or "logical correlation", where the computations in our minds are entangled in some way with computations or math that we're not physically connected to.

While this is one way to think about, it seems first of all that it is limited to "small" mathematical facts that are computable in physics (not stuff like the continuum hypothesis). With respect to the entanglement, while it's possible to have a Bayes net where the mathematical fact "causes" both computers to output the answers, there's an alternative approach where the computers are two material devices that output the same answer because of physical symmetry. Two processes having symmetrical outputs doesn't in general indicate they're "caused by the same thing".

arguments in favor of some types of mathematical realism/platonism (e.g., universe and multiverse views of set theory)

Not familiar with these arguments. I think a formalist approach would be, the consistency of ZFC already implies a bunch of "small" mathematical facts (e.g. ZFC can't prove any false Π1 arithmetic statements). I think it's pretty hard to find a useful a formal system that is strictly finitist, however my intuition is that set theory goes too far. (This is part of why I have been recently thinking about "reverse mathematics", relatively weak second-order arithmetic theories like WKL0)

Another reason I'm not ready to be super-convinced in this direction is I think philosophy is often very hard and slow, therefore as you say "It is somewhat questionable to infer from lack of success to define, say, optimal decision theories, that no such decision theory exists."

Yeah that makes sense. I think maybe what I've become more reluctant to endorse over time, is a jump from "an intuition that something here works, plus alternative solutions failing" to "here, this thing I came up with or something a lot like it is going to work". Like going from failure of CDT to success of EDT, or failure of CDT+EDT to TDT. There is not really any assurance that the new thing will work either.

we're not sure whether we'll eventually keep them when we're philosophically mature, and we don't know how to translate these values to a new ontology that lack these entities

I see this is a practical consideration in many value systems, although perhaps either (a) the pragmatic considerations go differently for different people, (b) different systems could be used for different pragmatic purposes. It at least presents a case for explaining the psychological phenomena of different ontologies/values even ones that might fail in physicalism.

Reply
A philosophical kernel: biting analytic bullets
jessicata2mo20

The precalculated "stochastic"variables thing, and the on-the-fly calls to the universe's rand() aren't the same thing, because they have different ontological implications.

Yeah they can be distinguished ontologically. Although there are going to be multiple Bayes nets expressing the same joint distribution. So it's not like there's going to be a canonical ordering.

I would guess that the standard rationalist answer is "they are indistinguishable empirically". But rationalism lacks a proof that unempirical questions are unaswerable or meaningless (unlike logical positivism..but LP is explicitly rejected).

I get that active dis-belief in further facts (such as counterfactuals) can be dogmatic. Rather, it's more of a case of, we can get an adequate empirical account without them, and adding them has problems (like causal counterfactuals implying violations of physical law).

Part of where I'm coming with this is a Chalmers like framework. Suppose there are 2 possible universes, they have the same joint distribution, but different causal ordering. Like maybe in one the stochasticity is on the fly, in the other it's pre-computed. They imply the same joint distribution and the same set of "straightforward" physical facts (particle trajectories and so on). Yet there is a distinction, a further fact.

In which case... The agents in these universes can't have epistemic access to these further facts, it's similar to with the zombie argument. A simple approach is "no further facts", although assuming this is literally the case might be dogmatic. It's more like, don't believe in further facts prior to a good/convincing account of them, where the ontological complexity is actually worth it.

Note that compaibilism and naturalistic libertarian are both viable given our present state of knowledge...so there is no necessity to adopt anti realism.

Well it's more like, most specific theories of these have problems. Like, the counterfactuals being really weird, corresponding to bad decision theories, etc. And it seems simpler to say, the counterfactuals don't exist? Even if assigning high probability to it is dogmatic.

So much for MWI then ..according to it, every world is counterfactual to every other.

If instead of QM our best physics said something like "there are true random coin flips" then it would be a bit of a stretch to posit a MWI-like theory there, that there exist other universes where the coin flips go differently. The case for MWI is somewhat more complex, it has to do with the Copenhagen interpretation being a lot more complicated than "here, have some stochastic coin flips".

How do you know counterfactuals require violations of physics itself? The possibility of something happening that wasn't what happened, only requires (genuine) indeterminism, as above.

Well we can disjunct on high or low universal K complexity. Assuming low universal K complexity, counterfactuals really do have problems, there are a lot of implications. Assuming high universal K complexity, I guess they're more well defined. Though you can't counterfact on just anything, you have to counterfact on a valid quantum event. So like, how many counterfactuals there are depends on the density of relevant quantum events to, say, a computer.

I guess you could make the case from QM that the classical trajectory has high K complexity, therefore counterfactual alternatives to the classical trajectory don't require physical law violations.

If not for QM though, our knowledge would be compatible with determinism / low K complexity of the classical trajectory, and it seems like a philosophy should be able to deal with that case (even if it empirically seems not to be the case)

You can hypothetically plan out a moon landing before you perform it for the first time.

Right so, counterfactual reasoning is practically useful, this is more about skepticism of the implied metaphysics. There might be translations like, observing that a deterministic system can be factored (multiple ways) as interacting systems with inputs/outputs, each factoring implying additional facts about the deterministic system. Without having to say that any of these factorings is correct in the sense of correctness about further facts.

Reply
A philosophical kernel: biting analytic bullets
jessicata2mo20

Ah. I think first of all, it is possible to do ontology in a materialist-directed or idealist-directed way, and the original post is materalist-directed.

I get that the joint distribution over physical facts determines a joint distribution over observations, and we couldn't observe further facts about the joint distribution beyond those implied by the distribution over observations.

I do feel there are a few differences though. Like, in the process of "predicting as if physics" we would be expanding a huge hidden variable theory, yet declaring the elements of the theory unreal. Also there would be issues like, how large is the mental unit doing the analysis? Is it a single person over time or multiple people, and over how much time? What theory of personal identity? What is the boundary between something observed or not observed? (With physicalism, although having some boundary between observed / not observed is epistemically relevant, it doesn't have to be exactly defined since it's not ontological; the ontology is something like an algebraic closure that is big enough to contain the state distinctions that are observed.)

I think maybe someone could try to make an idealist/solipsist minimal philosophy work but it's not what I've done and it doesn't seem easy to include this without running into problems like epistemic stability assumptions.

Reply
Load More
64A philosophical kernel: biting analytic bullets
2mo
21
33Measuring intelligence and reverse-engineering goals
2mo
10
17Towards plausible moral naturalism
3mo
9
23Generalizing zombie arguments
3mo
9
21The Weighted Perplexity Benchmark: Tokenizer-Normalized Evaluation for Language Model Comparison
Ω
3mo
Ω
0
27Why I am not a Theist
3mo
6
20"Self-Blackmail" and Alternatives
8mo
12
96On Eating the Sun
9mo
98
1252024 in AI predictions
9mo
3
96The Obliqueness Thesis
Ω
1y
Ω
19
Load More