Q Home

Discord: LemonUniverse (lemonuniverse). Reddit: u/Smack-works. Substack: The Lost Jockey. About my situation: here.

I wrote some worse posts before 2024 because I was very uncertain how the events may develop.

Wiki Contributions

Comments

Sorted by
Q Home10

So they overlook the simpler patterns because they pay less rent upfront, even though they are more general and a better investment long-term.

...

And if you use this metaphor to imagine what's going to happen to a tiny drop of water on a plastic table, you could predict that it will form a ball and refuse to spread out. While the metaphor may only be able to generate very uncertain & imprecise predictions, it's also more general.

Can you expand on the this thought ("something can give less specific predictions, but be more general") or reference famous/professional people discussing it? This thought can be very trivial, but it also can be very controversial.

Right now I'm writing a post about "informal simplicity", "conceptual simplicity". It discusses simplicity of informal concepts (concepts not giving specific predictions). I make an argument that "informal simplicity" should be very important a priori. But I don't know if "informal simplicity" was used (at least implicitly) by professional and famous people. Here's as much as I know: (warning, controversial and potentially inaccurate takes!)

  • Zeno of Elea made arguments basically equivalent to "calculus should exist" and "theory of computation should exist" ("supertasks are a thing") using only the basic math.

  • The success of neural networks is a success of one of the simplest mechanisms: backpropagation and attention. (Even though they can be heavy on math too.) We observed a complicated phenomenon (real neurons), we simplified it... and BOOM!

  • Arguably, many breakthroughs in early and late science were sealed behind simple considerations (e.g. equivalence principle), not deduced from formal reasoning. Feynman diagram weren't deduced from some specific math, they came from the desire to simplify.

  • Some fields "simplify each other" in some way. Physics "simplifies" math (via physical intuitions). Computability theory "simplifies" math (by limiting it to things which can be done by series of steps). Rationality "simplifies" philosophy (by connecting it to practical concerns) and science.

  • To learn flying, Wright brothers had to analyze "simple" considerations.

  • Eliezer Yudkowsky influenced many people with very "simple" arguments. Rational community as a whole is a "simplified" approach to philosophy and science (to a degree).

  • The possibility of a logical decision theory can be deduced from simple informal considerations.

  • Albert Einstein used simple thought experiments.

  • Judging by the famous video interview, Richard Feynman likes to think about simple informal descriptions of physical processes. And maybe Feynman talked about "less precise, but more general" idea? Maybe he said that epicycles were more precise, but a heliocentric model was better anyway? I couldn't find it.

  • Terry Tao occasionally likes to simplify things. (e.g. P=NP and multiple choice exams, Quantum mechanics and Tomb Raider, Special relativity and Middle-Earth and Calculus as “special deals”). Is there more?

  • Some famous scientists weren't shying away from philosophy (e.g. Albert Einstein, Niels Bohr?, Erwin Schrödinger).

Please, share any thoughts or information relevant to this, if you have any! It's OK if you write your own speculations/frames.

Q Home31

Meta-level comment: I don't think it's good to dismiss original arguments immediately and completely.

Object-level comment:

Neither of those claims has anything to do with humans being the “winners” of evolution.

I think it might be more complicated than that:

  1. We need to define what "a model produced by a reward function" means, otherwise the claims are meaningless. Like, if you made just a single update to the model (based on the reward function), calling it "a model produced by the reward function" is meaningless ('cause no real optimization pressure was applied). So we do need to define some goal of optimization (which determines who's a winner and who's a loser).
  2. We need to argue that the goal is sensible. I.e. somewhat similar to a goal we might use while training our AIs.

Here's some things we can try:

  • We can try defining all currently living species as winners. But is it sensible? Is it similar to a goal we would use while training our AIs? "Let's optimize our models for N timesteps and then use all surviving models regardless of any other metrics" <- I think that's not sensible, especially if you use an algorithm which can introduce random mutations into the model.
  • We can try defining species which avoided substantial changes for the longest time as winners. This seems somewhat sensible, because those species experienced the longest optimization pressure. But then humans are not the winners.
  • We can define any species which gained general intelligence as winners. Then humans are the only winners. This is sensible because of two reasons. First, with general intelligence deceptive alignment is possible: if humans knew that Simulation Gods optimize organisms for some goal, humans could focus on that goal or kill all competing organisms. Second, many humans (in our reality) value creating AGI more than solving any particular problem.

I think the later is the strongest counter-argument to "humans are not the winners".

Q Home10

My point is that chairs and humans can be considered in a similar way.

Please explain how your point connects to my original message: are you arguing with it or supporting it or want to learn how my idea applies to something?

Q Home10

I see. But I'm not talking about figuring out human preferences, I'm talking about finding world-models in which real objects (such as "strawberries" or "chairs") can be identified. Sorry if it wasn't clear in my original message because I mentioned "caring".

Models or real objects or things capture something that is not literally present in the world. The world contains shadows of these things, and the most straightforward way of finding models is by looking at the shadows and learning from them.

You might need to specify what you mean a little bit.

The most straightforward way of finding a world-model is just predicting your sensory input. But then you're not guaranteed to get a model in which something corresponding to "real objects" can be easily identified. That's one of the main reasons why ELK is hard, I believe: in an arbitrary world-model, "Human Simulator" can be much simpler than "Direct Translator".

So how do humans get world-models in which something corresponding to "real objects" can be easily identified? My theory is in the original message. Note that the idea is not just "predict sensory input", it has an additional twist.

Q Home10

Creating an inhumanly good model of a human is related to formulating their preferences.

How does this relate to my idea? I'm not talking about figuring out human preferences.

Thus it's a step towards eliminating path-dependence of particular life stories

What is "path-dependence of particular life stories"?

I think things (minds, physical objects, social phenomena) should be characterized by computations that they could simulate/incarnate.

Are there other ways to characterize objects? Feels like a very general (or even fully general) framework. I believe my idea can be framed like this, too.

Q Home52

There's an alignment-related problem, the problem of defining real objects. Relevant topics: environmental goals; task identification problem; "look where I'm pointing, not at my finger"; The Pointers Problem; Eliciting Latent Knowledge.

I think I realized how people go from caring about sensory data to caring about real objects. But I need help with figuring out how to capitalize on the idea.

So... how do humans do it?

  1. Humans create very small models for predicting very small/basic aspects of sensory input (mini-models).
  2. Humans use mini-models as puzzle pieces for building models for predicting ALL of sensory input.
  3. As a result, humans get models in which it's easy to identify "real objects" corresponding to sensory input.

For example, imagine you're just looking at ducks swimming in a lake. You notice that ducks don't suddenly disappear from your vision (permanence), their movement is continuous (continuity) and they seem to move in a 3D space (3D space). All those patterns ("permanence", "continuity" and "3D space") are useful for predicting aspects of immediate sensory input. But all those patterns are also useful for developing deeper theories of reality, such as atomic theory of matter. Because you can imagine that atoms are small things which continuously move in 3D space, similar to ducks. (This image stops working as well when you get to Quantum Mechanics, but then aspects of QM feel less "real" and less relevant for defining object.) As a result, it's easy to see how the deeper model relates to surface-level patterns.

In other words: reality contains "real objects" to the extent to which deep models of reality are similar to (models of) basic patterns in our sensory input.

Q Home10

I don't understand Model-Utility Learning (MUL) section, what pathological behavior does AI do?

Since humans (or something) must be labeling the original training examples, the hypothesis that building bridges means “what humans label as building bridges” will always be at least as accurate as the intended classifier. I don’t mean “whatever humans would label”. I mean they hypothesis that “build a bridge” means specifically the physical situations which were recorded as training examples for this system in particular, and labeled by humans as such.

So it's like overfitting? If I train MUL AI to play piano in a green room, MUL AI learns that "playing piano" means "playing piano in a green room" or "playing piano in a room which would be chosen for training me in the past"?

Now, we might reasonably expect that if the AI considers a novel way of “fooling itself” which hasn’t been given in a training example, it will reject such things for the right reasons: the plan does not involve physically building a bridge.

But "sensory data being a certain way" is a physical event which happens in reality, so MUL AI might still learn to be a solipsist? MUL doesn't guarantee to solve misgeneralization in any way?

If the answer to my questions is "yes", what did we even hope for with MUL?

Q Home20

I'm noticing two things:

  1. It's suspicious to me that values of humans-who-like-paperclips are inherently tied to acquiring an unlimited amount of resources (no matter in which way). Maybe I don't treat such values as 100% innocent, so I'm OK keeping them in check. Though we can come up with thought experiments where the urge to get more resources is justified by something. Like, maybe instead of producing paperclips those people want to calculate Busy Beaver numbers, so they want more and more computronium for that.
  2. How consensual were the trades if their outcome is predictable and other groups of people don't agree with the outcome? Looks like coercion.
Q Home20

Often I see people dismiss the things the Epicureans got right with an appeal to their lack of the scientific method, which has always seemed a bit backwards to me.

The most important thing, I think, is not even hitting the nail on the head, but knowing (i.e. really acknowledging) that a nail can be hit in multiple places. If you know that, the rest is just a matter of testing.

Q Home10

But avoidance of value drift or of unendorsed long term instability of one's personality is less obvious.

What if endorsed long term instability leads to negation of personal identity too? (That's something I thought about.)

Load More