Discord: LemonUniverse (lemonuniverse). Reddit: u/Smack-works. Substack: The Lost Jockey. About my situation: here.
I wrote some worse posts before 2024 because I was very uncertain how the events may develop.
Meta-level comment: I don't think it's good to dismiss original arguments immediately and completely.
Object-level comment:
Neither of those claims has anything to do with humans being the “winners” of evolution.
I think it might be more complicated than that:
Here's some things we can try:
I think the later is the strongest counter-argument to "humans are not the winners".
My point is that chairs and humans can be considered in a similar way.
Please explain how your point connects to my original message: are you arguing with it or supporting it or want to learn how my idea applies to something?
I see. But I'm not talking about figuring out human preferences, I'm talking about finding world-models in which real objects (such as "strawberries" or "chairs") can be identified. Sorry if it wasn't clear in my original message because I mentioned "caring".
Models or real objects or things capture something that is not literally present in the world. The world contains shadows of these things, and the most straightforward way of finding models is by looking at the shadows and learning from them.
You might need to specify what you mean a little bit.
The most straightforward way of finding a world-model is just predicting your sensory input. But then you're not guaranteed to get a model in which something corresponding to "real objects" can be easily identified. That's one of the main reasons why ELK is hard, I believe: in an arbitrary world-model, "Human Simulator" can be much simpler than "Direct Translator".
So how do humans get world-models in which something corresponding to "real objects" can be easily identified? My theory is in the original message. Note that the idea is not just "predict sensory input", it has an additional twist.
Creating an inhumanly good model of a human is related to formulating their preferences.
How does this relate to my idea? I'm not talking about figuring out human preferences.
Thus it's a step towards eliminating path-dependence of particular life stories
What is "path-dependence of particular life stories"?
I think things (minds, physical objects, social phenomena) should be characterized by computations that they could simulate/incarnate.
Are there other ways to characterize objects? Feels like a very general (or even fully general) framework. I believe my idea can be framed like this, too.
There's an alignment-related problem, the problem of defining real objects. Relevant topics: environmental goals; task identification problem; "look where I'm pointing, not at my finger"; The Pointers Problem; Eliciting Latent Knowledge.
I think I realized how people go from caring about sensory data to caring about real objects. But I need help with figuring out how to capitalize on the idea.
So... how do humans do it?
For example, imagine you're just looking at ducks swimming in a lake. You notice that ducks don't suddenly disappear from your vision (permanence), their movement is continuous (continuity) and they seem to move in a 3D space (3D space). All those patterns ("permanence", "continuity" and "3D space") are useful for predicting aspects of immediate sensory input. But all those patterns are also useful for developing deeper theories of reality, such as atomic theory of matter. Because you can imagine that atoms are small things which continuously move in 3D space, similar to ducks. (This image stops working as well when you get to Quantum Mechanics, but then aspects of QM feel less "real" and less relevant for defining object.) As a result, it's easy to see how the deeper model relates to surface-level patterns.
In other words: reality contains "real objects" to the extent to which deep models of reality are similar to (models of) basic patterns in our sensory input.
I don't understand Model-Utility Learning (MUL) section, what pathological behavior does AI do?
Since humans (or something) must be labeling the original training examples, the hypothesis that building bridges means “what humans label as building bridges” will always be at least as accurate as the intended classifier. I don’t mean “whatever humans would label”. I mean they hypothesis that “build a bridge” means specifically the physical situations which were recorded as training examples for this system in particular, and labeled by humans as such.
So it's like overfitting? If I train MUL AI to play piano in a green room, MUL AI learns that "playing piano" means "playing piano in a green room" or "playing piano in a room which would be chosen for training me in the past"?
Now, we might reasonably expect that if the AI considers a novel way of “fooling itself” which hasn’t been given in a training example, it will reject such things for the right reasons: the plan does not involve physically building a bridge.
But "sensory data being a certain way" is a physical event which happens in reality, so MUL AI might still learn to be a solipsist? MUL doesn't guarantee to solve misgeneralization in any way?
If the answer to my questions is "yes", what did we even hope for with MUL?
I'm noticing two things:
Often I see people dismiss the things the Epicureans got right with an appeal to their lack of the scientific method, which has always seemed a bit backwards to me.
The most important thing, I think, is not even hitting the nail on the head, but knowing (i.e. really acknowledging) that a nail can be hit in multiple places. If you know that, the rest is just a matter of testing.
But avoidance of value drift or of unendorsed long term instability of one's personality is less obvious.
What if endorsed long term instability leads to negation of personal identity too? (That's something I thought about.)
...
Can you expand on the this thought ("something can give less specific predictions, but be more general") or reference famous/professional people discussing it? This thought can be very trivial, but it also can be very controversial.
Right now I'm writing a post about "informal simplicity", "conceptual simplicity". It discusses simplicity of informal concepts (concepts not giving specific predictions). I make an argument that "informal simplicity" should be very important a priori. But I don't know if "informal simplicity" was used (at least implicitly) by professional and famous people. Here's as much as I know: (warning, controversial and potentially inaccurate takes!)
Zeno of Elea made arguments basically equivalent to "calculus should exist" and "theory of computation should exist" ("supertasks are a thing") using only the basic math.
The success of neural networks is a success of one of the simplest mechanisms: backpropagation and attention. (Even though they can be heavy on math too.) We observed a complicated phenomenon (real neurons), we simplified it... and BOOM!
Arguably, many breakthroughs in early and late science were sealed behind simple considerations (e.g. equivalence principle), not deduced from formal reasoning. Feynman diagram weren't deduced from some specific math, they came from the desire to simplify.
Some fields "simplify each other" in some way. Physics "simplifies" math (via physical intuitions). Computability theory "simplifies" math (by limiting it to things which can be done by series of steps). Rationality "simplifies" philosophy (by connecting it to practical concerns) and science.
To learn flying, Wright brothers had to analyze "simple" considerations.
Eliezer Yudkowsky influenced many people with very "simple" arguments. Rational community as a whole is a "simplified" approach to philosophy and science (to a degree).
The possibility of a logical decision theory can be deduced from simple informal considerations.
Albert Einstein used simple thought experiments.
Judging by the famous video interview, Richard Feynman likes to think about simple informal descriptions of physical processes. And maybe Feynman talked about "less precise, but more general" idea? Maybe he said that epicycles were more precise, but a heliocentric model was better anyway? I couldn't find it.
Terry Tao occasionally likes to simplify things. (e.g. P=NP and multiple choice exams, Quantum mechanics and Tomb Raider, Special relativity and Middle-Earth and Calculus as “special deals”). Is there more?
Some famous scientists weren't shying away from philosophy (e.g. Albert Einstein, Niels Bohr?, Erwin Schrödinger).
Please, share any thoughts or information relevant to this, if you have any! It's OK if you write your own speculations/frames.