Religions are collections of cherished but mistaken principles. So anything that can be described either literally or metaphorically as a religion will have valuable unexplored ideas in its shadow.

-Paul Graham

This post isn't intended to construct full arguments for any of my "heresies" - I am hoping that you may not have considered them at all yet, but some will seem obvious once written down. If not, I'd be happy to do a Dialogue or place a (non-or-small-monetary) bet on any of these, if properly formalizable.

  1. Now that LLM's appear to be stalling, we should return to Scott Aaronson's previous position and reason about our timeline uncertainty on a log scale. A.G.I. arriving in ~1 month very unlikely, ~1 year unlikely, ~10 years likely, ~100 years unlikely, ~1000 years very unlikely.
  2. Stop using LLM's to write. It burns the commons by filling allowing you to share takes  on topics you don't care enough to write about yourself, while also introducing insidious (and perhaps eventually malign) errors. Also it's probably making you dumber (source is speculative I don't have hard data).
  3. Non-causal decision theories are not necessary for A.G.I. design. A CDT agent in a box (say machine 1) can be forced to build whatever agent it expects to perform best by writing to a computer in a different box (say machine 2), before being summarily deleted. No self modification is necessary and no one needs to worry about playing games with their clone (except possibly the new agent in machine 2, who will be perfectly capable of using some decision theory that effectively pursues the goals of the old deleted agent). It's possible that exotic decision theories are still an important ingredient in alignment, but I see strong no reasons to expect this.
  4. All supposed philosophical defects of AIXI can be fixed for all practical purposes through relatively intuitive patches, extensions, and elaborations that remain in the spirit of the model. Direct AIXI approximations will still fail in practice, but only because of compute limitations which are even possible to brute force with slightly clever algorithms and planetary scale compute, but in practice this approach will lose to less faithful approximations (and unprincipled heuristics). But this is an unfair criticism, because-
  5. Though there are elegant and still practical specifications for intelligent behavior, the most intelligent agent that runs on some fixed hardware has completely unintelligible cognitive structures and in fact its source code is indistinguishable from white noise. This is why deep learning algorithms are simple but trained models are terrifyingly complex. Also, mechanistic interpretability is a doomed research program.
  6. The idea of a human "more effectively pursuing their utility function" is not coherent because humans don't have utility functions - our bounded cognition means that none of us has been able to construct consistent preferences over large futures that we would actually endorse if our intelligence scaled up. However, there do exist fairly coherent moral projects such as religions, the enlightenment, the ideals of Western democracy, and other ideologies along with their associated congregations, universities, nations, and other groups, of which individuals make up a part. These larger entities can be better thought of as having coherent utility functions. It is perhaps more correct to ask "what moral project do I wish to serve" than "what is my utility function?" We do not have the concepts to discuss what "correct" means in the previous sentence, which may be tied up with our inability to solve the alignment problem (in particular, our inability to design a corrigible agent).
New Comment
12 comments, sorted by Click to highlight new comments since:

Some other examples:

  1. Agency and embeddedness are fundamentally at odds with each other. Decision theory and physics are incompatible approaches to world-modeling, with each making assumptions that are inconsistent with the other. Attempting to build mathematical models of embedding agency will fail as an attempt to understand advanced AI behavior.
  2. Reductionism is false. If modeling a large-scale system in terms of the exact behavior of its small-scale components would take longer than the age of the universe, or would require a universe-sized computer, the large-scale system isn't explicable in terms of small-scale interactions even in principle. The Sequences are incorrect to describe non-reductionism as ontological realism about large-scale entities -- the former doesn't inherently imply the latter.
  3. Relatedly, nothing is ontologically primitive. Not even elementary particles: if, for example, you took away the mass of an electron, it would cease to be an electron and become something else. The properties of those particles, as well, depend on having fields to interact with. And if a field couldn't interact with anything, could it still be said to exist?
  4. Ontology creates axiology and axiology creates ontology. We aren't born with fully formed utility functions in our heads telling us what we do and don't value. Instead, we have to explore and model the world over time, forming opinions along the way about what things and properties we prefer. And in turn, our preferences guide our exploration of the world and the models we form of what we experience. Classical game theory, with its predefined sets of choices and payoffs, only has narrow applicability, since such contrived setups are only rarely close approximations to the scenarios we find ourselves in.
Reply3311

Non-causal decision theories are not necessary for A.G.I. design.

I'll call that and raise you "No decision theory of any kind, causal or otherwise, will either play any important explicit role in, or have any important architectural effect over, the actual design of either the first AGI(s), or any subsequent AGI(s) that aren't specifically intended to make the point that it's possible to use decision theory".

Any agent that makes decisions has an implicit decision theory, it just might not be a very good one. I don't think anyone ever said advanced decision theory was required for AGI, only for robust alignment.

The standard method for training LLM's is next token prediction with teacher-forcing, penalized by the negative log-loss. This is exactly the right setup to elicit calibrated conditional probabilities, and exactly the "prequential problem" that Solomonoff induction was designed for. I don't think this was motivated by decision theory, but it definitely makes perfect sense as an approximation to Bayesian inductive inference - the only missing ingredient is acting to optimize a utility function based on this belief distribution. So I think it's too early to suppose that decision theory won't play a role. 

I like what you're doing, but I feel like the heresies you propose are too tame.

Here are some more radical heresies to consider:

  1. Most people are far more bottlenecked on some combination of akrasia and prospective memory, not on the accuracy of their models of the world. Rationalists in particular would be better off devoting effort to actually doing the obvious things than to understanding the world better.
  2. Self deception is very instrumentally useful a large fraction of real world situations we find ourselves in, and we should use more of it.
    1. Mormons seem to be especially good at coordinating on good lifestyle choices, so we should all consider becoming Mormon.
  3. Among groups of 10+ people, it's usually more useful to get everyone all working on implementing the same plan than it is to come up with the best plan.
  4. Intelligence (of the sort measured by exams and IQ tests) is only moderately important to success.

I'm not entirely sure how many of these I agree with, but I don't really think any of them could be considered heretical or even all that uncommon as opinions on LW?

All but #2 seem to me to be pretty well represented ideas, even in the Sequences themselves (to the extent the ideas existed when the Sequences got written).

#2 seems to me to rely on the idea that the process of writing is central or otherwise critical to the process of learning about, and forming a take on, a topic. I have thought about this, and I think for some people it is true, but for me writing is often a process of translating an already-existing conceptual web into a linear approximation of itself. I'm not very good at writing in general, and having an LLM help me wordsmith concepts and workshop ideas as a dialogue partner is pretty helpful. I usually form takes my reading and discussing and then thinking quietly, not so much during writing if I'm writing by myself. Say I read a bunch of things or have some conversations, take notes on these, write an outline of the ideas/structure I want to convey, and share the notes and outline with an LLM. I ask it to write a draft that it and I then work on collaboratively. How is that meaningfully worse than writing alone, or writing with a human partner? Unless you meant literally "Ask an LLM for an essay on a topic and publish it," in which case yes, I agree.

Stop using LLM's to write. It burns the commons by filling allowing you to share takes  on topics you don't care enough to write about yourself, while also introducing insidious (and perhaps eventually malign) errors. 

Yeah, someone just started doing this in ACX comments, and it's annoying.

When I read texts written by humans, there is some relation between the human and the text. If I trust the human, I will trust the text. If the text is wrong, I will stop trusting the human. Shortly, I hold humans accountable for their texts.

But if you just copy-paste whatever the LLM has vomited out, I don't know... did you at least do some sanity check, in other words, are you staking your personal reputation on these words? Or if I spend my time finding an error, will you just shrug and say "not my fault, we all know that LLMs hallucinate sometimes"? In other words, will feedback improve your writing in the future? If not... then the only reason to give feedback is to warn other humans who happen to read that text.

The same thing applies when someone uses an LLM to generate code. Yes, it is often a way more efficient way to write the code. But did you review the code? Or are you just copying it blindly? We already had a smaller version of this problem with people blindly copying code from Stack Exchange. LLM is like Stack Exchange on steroids, both the good and the bad parts.

there do exist fairly coherent moral projects such as religions

I am not sure how coherent they are. For example, I was reading on ACX about Christianity, and... it has the message of loving your neighbor and turning the other cheek... but also the recommendation not to cast pearls before the swine... and I am not sure whether it makes it clear when exactly are you supposed to treat your neighbors with love or as swines.

It also doesn't provide an answer to whom you should give your coat if two people are trying to steal your shirt, etc.

Plus, there were historical situations when Christians didn't turn the other cheek (Crusades, Inquisition, etc.), and maybe without those situations Christianity would not exist today.

What I am saying is that there is a human judgment involved (which sometimes results in breaking the rules), and maybe the projects are not going to work without that.

My impression is that e.g. the Catholic church has a pretty deeply thought out moral philosophy that has persisted across generations. That doesn't mean that every individual Catholic understands and executes it properly. 

[-]Jiro40

My own heresy is that I don't have a true rejection. Many ideas are things which I believe by accumulation of evidence and there's no single item which would disprove my position. And talking about a "true rejection" is really trying to create a gotcha to force someone to change their position without allowing for things such as accumulation of evidence or even misphrasing the rejection.

I also think rationalists shouldn't bet, but that probably deserves its own post.

Though there are elegant and still practical specifications for intelligent behavior, the most intelligent agent that runs on some fixed hardware has completely unintelligible cognitive structures and in fact its source code is indistinguishable from white noise.

  • What does "most intelligent agent" mean?
  • Don't you think we'd also need to specify "for a fixed (basket of) tasks"?
  • Are the I/O channels fixed along with the hardware?
  • Perhaps Legg-Hutter intelligence.
  • I'm not sure how much the goal matters - probably the details depend on the utility function you want to optimize. I think you can do about as well as possible by carving out a utility function module and designing the rest uniformly to pursue the objectives of that module. But perhaps this comes at a fairly significant cost (i.e. you'd need a somewhat larger computer to get the same performance if you insist on doing it this way).
  • ...And yes, there does exist a computer program which is remarkably good at just chess and nothing else, but that's not the kind of thing I'm talking about here.
  • Yes, the I/O channels should be fixed along with the hardware.

Thank you for clarifying. I appreciate and point out as relevant the fact that Legg-Hutter includes in it's definition "for all environments (ie action:observation mappings)". I can now say I agree with your "heresy" with a high credence for the cases where compute budgets are not ludicrously small relative to I/O scale, and the utility function is not trivial. I'm a bit weirded out by the environment space being conditional on a fixed hardware variable (namely, I/O) in this operationalization, but whathever.