Nate Showell

Wiki Contributions

Comments

Sorted by

A definition of physics that treats space and time as fundamental doesn't quite work, because there are some theories in physics such as loop quantum gravity in which space and/or time arise from something else.

Answer by Nate Showell132

"Seeing the light" to describe having a mystical experience. Seeing bright lights while meditating or praying is an experience that many practitioners have reported, even across religious traditions that didn't have much contact with each other.

Some other examples:

  1. Agency and embeddedness are fundamentally at odds with each other. Decision theory and physics are incompatible approaches to world-modeling, with each making assumptions that are inconsistent with the other. Attempting to build mathematical models of embedding agency will fail as an attempt to understand advanced AI behavior.
  2. Reductionism is false. If modeling a large-scale system in terms of the exact behavior of its small-scale components would take longer than the age of the universe, or would require a universe-sized computer, the large-scale system isn't explicable in terms of small-scale interactions even in principle. The Sequences are incorrect to describe non-reductionism as ontological realism about large-scale entities -- the former doesn't inherently imply the latter.
  3. Relatedly, nothing is ontologically primitive. Not even elementary particles: if, for example, you took away the mass of an electron, it would cease to be an electron and become something else. The properties of those particles, as well, depend on having fields to interact with. And if a field couldn't interact with anything, could it still be said to exist?
  4. Ontology creates axiology and axiology creates ontology. We aren't born with fully formed utility functions in our heads telling us what we do and don't value. Instead, we have to explore and model the world over time, forming opinions along the way about what things and properties we prefer. And in turn, our preferences guide our exploration of the world and the models we form of what we experience. Classical game theory, with its predefined sets of choices and payoffs, only has narrow applicability, since such contrived setups are only rarely close approximations to the scenarios we find ourselves in.
Reply3311

How does this model handle horizontal gene transfer? And what about asexually reproducing species? In those cases, the dividing lines between species are less sharply defined.

The ideas of the Cavern are the Ideas of every Man in particular; we every one of us have our own particular Den, which refracts and corrupts the Light of Nature, because of the differences of Impressions as they happen in a Mind prejudiced or prepossessed.

Francis Bacon, Novum Organum Scientarum, Section II, Aphorism V

The reflective oracle model doesn't have all the properties I'm looking for -- it still has the problem of treating utility as the optimization target rather than as a functional component of an iterative behavior reinforcement process. It also treats the utilities of different world-states as known ahead of time, rather than as the result of a search process, and assumes that computation is cost-free. To get a fully embedded theory of motivation, I expect that you would need something fundamentally different from classical game theory. For example, it probably wouldn't use utility functions.

Why are you a realist about the Solomonoff prior instead of treating it as a purely theoretical construct?

A theory of embedded world-modeling would be an improvement over current predictive models of advanced AI behavior, but it wouldn't be the whole story. Game theory makes dualistic assumptions too (e.g., by treating the decision process as not having side effects), so we would also have to rewrite it into an embedded model of motivation.

 

Cartesian frames are one of the few lines of agent foundations research in the past few years that seem promising, due to allowing for greater flexibility in defining agent-environment boundaries. Preferably, we would have a model that lets us avoid having to postulate an agent-environment boundary at all. Combining a successor to Cartesian frames with an embedded theory of motivation, likely some form of active inference, might give us an accurate overarching theory of embedded behavior.

And this is where the fundamental AGI-doom arguments – all these coherence theorems, utility-maximization frameworks, et cetera – come in. At their core, they're claims that any "artificial generally intelligent system capable of autonomously optimizing the world the way humans can" would necessarily be well-approximated as a game-theoretic agent. Which, in turn, means that any system that has the set of capabilities the AI researchers ultimately want their AI models to have, would inevitably have a set of potentially omnicidal failure modes.

This is my crux with people who have 90+% P(doom): will vNM expected utility maximization be a good approximation of the behavior of TAI? You argue that it will, but I expect that it won't.

 

My thinking related to this crux is informed less by the behaviors of current AI systems (although they still influence it to some extent) than by the failure of the agent foundations agenda. The dream 10 years ago was that if we started by modeling AGI as an vNM expected utility maximizer, and then gradually added more and more details to our model to account for differences between the idealized model and real-world AI systems, we would end up with an accurate theoretical system for predicting the behaviors AGI would exhibit. It would be a similar process to how physicists start with an idealized problem setup and add in details like friction or relativistic corrections.

 

But that isn't what ended up happening. Agent foundations researchers ended up getting stuck on the cluster of problems collectively described as embedded agency, unable to square the dualistic assumptions of expected utility theory and Bayesianism with the embedded structure of real-world AI systems. The sub-problems of embedded agency are many and too varied to allow one elegant theorem to fix everything. Instead, they point to a fundamental flaw in the expected utility maximizer model, suggesting that it isn't as widely applicable as early AI safety researchers thought. 

 

The failure of the agent foundations agenda has led me to believe that expected utility maximization is only a good approximation for mostly-unembedded systems, and that an accurate theoretical model of advanced AI behavior (if such a thing is possible) would require a fundamentally different, less dualistic set of concepts. Coherence theorems and decision-theoretic arguments still rely on the old, unembedded assumptions and therefore don't provide an accurate predictive model. 

Philosophy is frequently (probably most of the time) done in order to signal group membership rather than as an attempt to accurately model the world. Just look at political philosophy or philosophy of religion. Most of the observations you note can be explained by philosophers operating at simulacrum level 3 instead of level 1.

Load More