(draft of a future post)
I want to share my model of intelligence and research. You won't agree with it at the first glance. Or at the third glance. (My hope is that you will just give up and agree at the 20th glance.)
But that's supposed to be good: it means the model is original and brave enough to make risky statements.
In this model any difference in "intelligence levels" or any difference between two minds in general boils down to "commitment level".
On some level, "commitment" is just a word. It's not needed to define the ideas I'm going to talk about. What's much more important is the three levels of commitment. There are often three levels which follow the same pattern, the same outline:
Level 1. You explore a single possibility.
Level 2. You want to explore all possibilities. But you are paralyzed by the amount of possibilities. At this level you are interested in qualities of possibilities. You classify possibilities and types of possibilities.
Level 3. You explore all possibilities through a single possibility. At this level you are interested in dynamics of moving through the possibility space. You classify implications of possibilities.
...
I'm going to give specific examples of the pattern above. This post is kind of repetitive, but it wasn't AI-generated, I swear. Repetition is a part of commitment.
My explanation won't be clear before you read the post, but here it goes:
Commitment describes your values and the "level" of your intentionality.
Commitment describes your level of intelligence (in a particular topic). Compared to yourself (your potential) or other people.
Commitments are needed for communication. Without shared commitments it's impossible for two people to find a common ground.
Commitment describes the "true content" of an argument, an idea, a philosophy. Ultimately, any property of a mind boils down to "commitments".
I think there are three levels of commitment to exploration.
Level 1. You treat things as immediate means to an end.
Imagine two enemy caveman teleported into a laboratory. They try to use whatever they find to beat each other. Without studying/exploring what they're using. So, they are just throwing microscopes and beakers at each other. They throw anti-matter guns at each other without even activating them.
Level 2. You explore things for the sake of it.
Think about mathematicians. They can explore math without any goal.
Level 3. You use particular goals to guide your exploration of things. Even though you would care about exploring them without any goal anyway. The exploration space is just too large, so you use particular goals to narrow it down.
Imagine a physicist who explores mathematics by considering imaginary universes and applying physical intuition to discover deep mathematical facts. Such person uses a particular goal/bias to guide "pure exploration". (inspired by Edward Witten, see Michael Atiyah's quote)
In terms of exploring ideas, our culture is at the level 1 (angry caveman). We understand ideas only as "ideas of getting something (immediately)" or "ideas of proving something (immediately)". We are not interested in exploring ideas for the sake of it. The only metrics we apply to ideas are "(immediate) usefulness" and "trueness". Not "beauty", "originality" and "importance". People in general are at the level 1. Philosophers are at the level 1 or "1.5". Rationality community is at the level 1 too (sadly): rationalists still mostly care only about immediate usefulness and truth.
In terms of exploring argumentation and reasoning, our culture is at the level 1. If you never thought "stupid arguments don't exist", then you are at the level 1: you haven't explored arguments and reasoning for the sake of it, you immediately jumped to assuming "The Only True Way To Reason" (be it your intuition, scientific method, particular ideology or Bayesian epistemology). You haven't stepped outside of your perspective a single time. Almost everyone is at the level 1. Eliezer Yudkowsky is at the level 3, but in a much narrower field: Yudkowsky explored rationality with the specific goal/bias of AI safety. However, overall Eliezer is at level 1 too: never studied human reasoning outside of what he thinks is "correct".
I think this is kind of bad. We are at the level 1 in the main departments of human intelligence and human culture. Two levels below our true potential.
I think there are three levels of commitment to goals.
Level 1. You have a specific selfish goal.
"I want to get a lot of money" or "I want to save my friends" or "I want to make a ton of paperclips", for example.
Level 2. You have an abstract goal. But this goal doesn't imply much interaction with the real world.
"I want to maximize everyone's happiness" or "I want to prevent (X) disaster", for example. This is a broad goal, but it doesn't imply actually learning and caring about anyone's desires (until the very end). Rationalists are at this level of commitment.
Level 3. You use particular goals to guide your abstract goals.
Some political activists are at this level of commitment. (But please, don't bring CW topics here!)
"Commitment to updating" is the ability to re-start your exploration from the square one. I think there are three levels to it.
Level 1. No updating. You never change ideas.
You just keep piling up your ideas into a single paradigm your entire life. You turn beautiful ideas into ugly ones so they fit with all your previous ideas.
Level 2. Updating. You change ideas.
When you encounter a new beautiful idea, you are ready to reformulate your previous knowledge around the new idea.
Level 3. Updating with "check points". You change ideas, but you use old ideas to prime new ones.
When you explore an idea, you mark some "check points" which you reached with that idea. When you ditch the idea for a new one, you still keep in mind the check points you marked. And use them to explore the new idea faster.
I think there are three levels of commitment in theory-building.
Level 1.
You build your theory using only "almost facts". I.e. you come up with "trivial" theories which are almost indistinguishable from the things we already know.
Level 2.
You build your theory on speculations. You "fantasize" important properties of your idea (which are important only to you or your field).
Level 3.
You build your theory on speculations. But those speculations are important even outside of your field.
I think Eliezer Yudkowsky and LW did theory-building of the 3rd level. A bunch of LW ideas are philosophically important even if you disagree with Bayesian epistemology (Eliezer's view on ethics and math, logical decision theories and some Alignment concepts).
I think there are three types of commitment in explaining a phenomenon.
Level 1.
You just want to predict the phenomenon. But many-many possible theories can predict the phenomenon, so you need something more.
Level 2.
You compare the phenomenon to other phenomena and focus on its qualities.
That's where most of theories go wrong: people become obsessed with their own fantasies about qualities of a phenomenon.
Level 3.
You focus on dynamics which connect this phenomenon to other phenomena. You focus on overlapping implications of different phenomena. 3rd level is needed for any important scientific breakthrough. For example:
Imagine you want to explain combustion (why/how things burn). On one hand you already "know everything" about the phenomenon, so what do you even do? Level 1 doesn't work. So, you try to think about qualities of burning, types of transformations, types of movement... but that won't take you anywhere. Level 2 doesn't work too. The right answer: you need to think not about qualities of transformations and movements, but about dynamics (conservation of mass, kinetic theory of gases) which connect different types of transformations and movements. Level 3 works.
I think there are three levels of commitment in epistemology.
Level 1. You assume the primary reality of the physical world. (Physicism)
Take statements "2 + 2 = 4" and "God exists". To judge those statements, a physicist is going to ask "Do those statements describe reality in a literal way? If yes, they are true."
Level 2. You assume the primary reality of statements of some fundamental language. (Descriptivism)
To judge statements, a descriptivist is going to ask "Can those statements be expressed in the fundamental language? If yes, they are true."
Level 3. You assume the primary reality of semantic connections between statements of languages. And the primary reality of some black boxes which create those connections. (Connectivism) You assume that something physical shapes the "language reality".
To judge statements, a connectivist is going to ask "Do those statements describe an important semantic connection? If yes, they are true."
...
Recap. Physicist: everything "physical" exists. Descriptivist: everything describable exists. Connectivist: everything important exists. Physicist can be too specific and descriptivist can be too generous. (This pattern of being "too specific" or "too generous" repeats for all commitment types.)
Thinking at the level of semantic connections should be natural to people (because they use natural language and... neural nets in their brains!). And yet this idea is extremely alien to people epistemology-wise.
In general, rationalists are "confused" between level 1 and level 2. I.e. they often treat level 2 very seriously, but aren't fully committed to it.
Eliezer Yudkowsky is "confused" between level 1 and level 3. I.e. Eliezer has a lot of "level 3 ideas", but doesn't apply level 3 thinking to epistemology in general.
So, Eliezer has a bunch of ideas which can be interpreted as "some maps ARE the territory".
I think there are three levels of commitment in doubting one's own reasoning.
Level 1.
You're uncertain about superficial "correctness" of your reasoning. You worry if you missed a particular counter argument. Example: "I think humans are dumb. But maybe I missed a smart human or applied a wrong test?"
Level 2.
You un-systematically doubt your assumptions and definitions. Maybe even your inference rules a little bit (see "inference objection"). Example: "I think humans are dumb. But what is a "human"? What is "dumb"? What is "is"? And how can I be sure in anything at all?"
Level 3.
You doubt the semantic connections (e.g. inference rules) in your reasoning. You consider particular dynamics created by your definitions and assumptions. "My definitions and assumptions create this dynamic (not presented in all people). Can this dynamic exploit me?"
Example: "I think humans are dumb. But can my definition of "intelligence" exploit me? Can my pessimism exploit me? Can this be an inconvenient way to think about the world? Can my opinion turn me into a fool even I'm de facto correct?"
...
Level 3 is like "security mindset" applied to your own reasoning. LW rationality mostly teaches against it, suggesting you to always take your smallest opinions at face value as "the truest thing you know". With some exceptions, such as "ethical injunctions", "radical honesty", "black swan bets" and "security mindset".
I think there are three levels of commitment in understanding your opponent.
Level 1.
You can pass the Ideological Turing Test in a superficial way (you understand the structure of the opponent's opinion).
Level 2. "Telepathy".
You can "inhabit" the emotions/mindset of your opponent.
Level 3.
You can describe the opponent's position as a weaker version/copy of your own position. And additionally you can clearly imagine how your position could turn out to be "the weaker version/copy" of the opponent's position. You find a balance between telepathy and "my opinion is the only one which makes sense!"
I think there are three levels of commitment in "resolving" problems.
Level 1.
You treat a problem as a puzzle to be solved by Your Favorite True Epistemology.
Level 2.
You treat a problem as a multi-layered puzzle which should be solved on different levels.
Level 3.
You don't treat a problem as a self-contained puzzle. You treat it as a "symbol" in the multitude of important languages. You can solve it by changing its meaning (by changing/exploring the languages).
Applying this type of thinking to the Unexpected hanging paradox:
I don't treat this paradox as a chess puzzle: I don't think it's something that could be solved or even "made sense of" from the inside. You need outside context. Like, does it ask you to survive? Then you can simply expect the hanging every day and be safe. (Though - can you do this to your psychology?) Or does the paradox ask you to come up with formal reasoning rules to solve it? But you can make any absurd reasoning system - to make a meaningful system you need to answer "for what purposes this system is going to be needed except this paradox". So, I think that "from the inside" there's no ground truth (though it can exist "from the outside"). Without context there's a lot of simple, but absurd or trivial solutions like "ignore logic, think directly about outcomes" or "come up with some BS reasoning system". Or say "Solomonoff induction solves all paradoxes: even if it doesn't, it's the best possible predictor of reality, so just ignore philosophers, lol".
I think there are three levels of commitment in morality.
Level 1. Norms, desires.
You analyze norms of specific communities and desires of specific people. That's quite easy: you are just learning facts.
Level 2. Ethics and meta-ethics.
You analyze similarities between different norms and desires. You get to pretty abstract and complicated values such as "having agency, autonomy, freedom; having an interesting life; having an ability to form connections with other people". You are lost in contradictory implications, interpretations and generalizations of those values. You have a (meta-)ethical paralysis.
Level 3. "Abstract norms".
You analyze similarities between implications of different norms and desires. You analyze dynamics created by specific norms. You realize that the most complicated values are easily derivable from the implications of the simplest norms. (Not without some bias, of course, but still.)
I think moral philosophers and Alignment researches are seriously dropping the ball by ignoring the 3rd level. Acknowledging the 3rd level doesn't immediately solve Alignment, but it can pretty much "solve" ethics (with a bit of effort).
I think there are three levels of values.
Level 1. Inside values ("feeling good").
You care only about things inside of your mind. For example, do you feel good or not?
Level 2. Real values.
You care about things in the real world. Even though you can't care about them directly. But you make decisions to not delude yourself and not "simulate" your values.
Level 3. Semantic values.
You care about elements of some real system. And you care about proper dynamics of this system. For example, you care about things your friend cares about. But it's also important to you that your friend is not brainwashed, not controlled by you. And you are ready that one day your friend may stop caring about anything. (Your value may "die" a natural death.)
3rd level is the level of "semantic values". They are not "terminal values" in the usual sense. They can be temporal and history-dependent.
So, you're interested in ways in which an AI can go wrong. What specifically can you be interested in? I think there are three levels to it.
Level 1. In what ways some AI actions are bad?
You classify AI bugs into types. For example, you find "reward hacking" type of bugs.
Level 2. What qualities of AIs are good/bad?
You classify types of bugs into "qualities". You find such potentially bad qualities as "AI doesn't care about the real world" and "AI doesn't allow to fix itself (corrigibility)".
Level 3. What bad dynamics are created by bad actions of AI? What good dynamics are destroyed?
Assume AI turned humanity into paperclips. What's actually bad about that, beyond the very first obvious answer? What good dynamics did this action destroy? (Some answers: it destroyed the feedback loop, the connection between the task and its causal origin (humanity), the value of paperclips relative to other values, the "economical" value of paperclips, the ability of paperclips to change their value.)
On the 3rd level you classify different dynamics. I think people completely ignore the 3rd level. In both Alignment and moral philosophy. 3rd level is the level of "semantic values".
I think Security Mindset has three levels of commitment.
Level 1. Ordinary paranoia.
You have great imagination, you can imagine very creative attacks on your system. You patch those angles of attack.
Level 2. Security Mindset.
You study your own reasoning about safety of the system. You check if your assumptions are right or wrong. Then, you try to delete as much assumptions as you can. Even if they seem correct to you! You also delete anomalies of the system even if they seem harmless. You try to simplify your reasoning about the system seemingly "for the sake of it".
Level 3.
You design a system which would be safe even in a world with changing laws of physics and mathematics. Using some bias, of course (otherwise it's impossible).
Humans, idealized humans are "level 3 safe". All/almost all current approaches to Alignment don't give you a "level 3 safe" AI.
I think there are three levels of commitment a (mis)aligned AI can have. Alternatively, those are three or two levels at which you can try to solve the Alignment problem.
Level 1.
AI has a fixed goal or a fixed method of finding a goal (which likely can't be Aligned with humanity). It respects only its own agency. So, ultimately it does everything it wants.
Level 2.
AI knows that different ethics are possible and is completely uncertain about ethics. AI respects only other people's agency. So, it doesn't do anything at all (except preventing, a bit lazily, 100% certain destruction and oppression). Or requires an infinite permission:
Level 3.
AI can respect both its own agency and the agency of humanity. AI finds a way to treat its agency as the continuation of the agency of people. AI makes sure it doesn't create any dynamic which couldn't be reversed by people (unless there's nothing else to do). So, AI can both act and be sensitive to people.
I think a fully safe system exists only on the level 3. The most safe system is the system which understands what "exploitation" means, so it never willingly exploits its rewards in any way. Humans are an example of such system.
I think alignment researchers are "confused" between level 1 and level 3. They try to fix different "exploitation methods" (ways AI could exploit its rewards) instead of making the AI understand what "exploitation" means.
I also think this is the reason why alignment researches don't cooperate much, pushing in different directions.
Commitments exist even on the level of perception. There are three levels of properties to which your perception can react.
Level 1. Inherent properties.
You treat objects as having more or less inherent properties. "This person is inherently smart."
Level 2. Meta-properties.
You treat any property as universal. "Anyone is smart under some definition of smartness."
Level 3. Semantic properties.
You treat properties only as relatively attached to objects: different objects form a system (a "language") where properties get distributed between them and differentiated. "Everyone is smart, but in a unique way. And those unique ways are important in the system."
I think there are three levels of commitment to experiences.
Level 1.
You're interested in particular experiences.
Level 2.
You want to explore all possible experiences.
Level 3.
You're interested in real objects which produce your experiences (e.g. your friends): you're interested what knowledge "all possible experiences" could reveal about them. You want to know where physical/mathematical facts and experiences overlap.
I think there are three levels of investigating the connection between experience and morality.
Level 1.
You study how experience causes us to do good or bad things.
Level 2.
You study all the different experiences "goodness" and "badness" causes in us.
Level 3.
You study dynamics created by experiences, which are related to morality. You study implications of experiences. For example: "loving a sentient being feels fundamentally different from eating a sandwich. food taste is something short and intense, but love can be eternal and calm. this difference helps to not treat other sentient beings as something disposable"
I think the existence of the 3rd level isn't acknowledged much. And yet it could be important for alignment. Most versions of moral sentimentalism are 2nd level at best. Epistemic Sentimentalism can be 3rd level.
You can ponder your commitment to specific things.
Are you committed to information?
Imagine you could learn anything (and forget it if you want). Would you be interested in learning different stuff more or less equally? You could learn something important (e.g. the most useful or the most abstract math), but you also could learn something completely useless - such as the life story of every ant who ever lived.
I know, this question is hard to make sense of: of course, anyone would like to learn everything/almost everything if there was no downside to it. But if you have a positive/negative commitment about the topic, then my question should make some sense anyway.
Are you committed to people?
Imagine you got extra two years to just talk to people. To usual people on the street or usual people on the Internet.
Would you be bored hanging out with them?
My answers: >!Maybe I was committed to information in general as a kid. Then I became committed to information related to people, produced by people, known by people.!<
I encountered a bunch of people who are more committed to exploring ideas (and taking ideas seriously) than usual. More committed than most rationalists, for example.
But I felt those people lack something:
They are able to explore ideas, but don't care about that anymore. They care only about their own clusters of idiosyncratic ideas.
They have very vague goals which are compatible with any specific actions.
They don't care if their ideas could even in principle matter to people. They have "disconnected" from other people, from other people's context (through some level of elitism).
When they acknowledge you as "one of them", they don't try to learn your ideas or share their ideas or argue with you or ask your help for solving a problem.
So, their commitment remains very low. And they are not "committed" to talking.
If you have a high level of commitment (3rd level) at least to something, then we should find a common language. You may even be like a sibling to me.
Thank you for reading this post. 🗿
I think there are three levels of commitment to patterns.
I think there are three levels in the relationship between patterns and causality. I'm going to give examples about visual patterns.
Level 1.
You learn which patterns are impossible due to local causal processes.
For example: "I'm unlikely to see a big tower made of eggs standing on top of each other". It's just not a stable situation due to very familiar laws of physics.
Level 2.
You learn statistical patterns (correlations) which can have almost nothing to do with causality.
For example: "people like to wear grey shirts".
Level 3.
You learn patterns which have a strong connection to other patterns and basic properties of images. You could say such patterns are created/prevented by "global" causal processes.
For example: "I'm unlikely to see a place fully filled with dogs. dogs are not people or birds or insects, they don't create such crowds". This is very abstract, connects to other patterns and basic properties of images.
I think...
It's likely that Machine Learning models don't learn level 3 patterns as well as they could, as sharply as they could.
Machine Learning models should be 100% able to learn level 3 patterns. It shouldn't require any specific data.
Learning/comparing level 3 patterns is interesting enough on its own. It could be its own area of research. But we don't apply statistics/Machine Learning to try mining those patterns. This may be a missed opportunity for humans.
I think researchers are making a blunder by not asking "what kinds of patterns exist? what patterns can be learned in principle?" (not talking about universal approximation theorem)
Suppose you want to study different cognitive processes, skills, types of knowledge. There are three levels:
You study particular cognitive processes.
You study qualities of cognitive processes.
You study dynamics created by cognitive processes. How "actions" of different cognitive processes overlap.
I think you can describe different cognitive processes in terms of patterns they learn. For example:
I think all this could be easily enough formalized.
Can you be committed to exploring commitment?
I think yes.
One thing you can do is to split topics into sub-topics and raise your commitment in every particular sub-topic. Vaguely similar to gradient descent. That's what I've been doing in this post so far.
Another thing you can do is to apply recursion. You can split any topic into 3 levels of commitment. But then you can split the third level into 3 levels too. So, there's potentially an infinity of levels of commitment. And there can be many particular techniques for exploiting this fact.
But the main thing is the three levels of "exploring ways to explore commitment":
I don't have enough information or experience for the 3rd level right now.
*A more "formal" version of the draft (it's a work in progress): *
There are two interpretations of this post, weak and strong.
Weak interpretation:
I describe a framework about "thee levels of exploration". I use the framework to introduce some of my ideas. I hope that the framework will give more context to my ideas, making them more understandable. I simply want to find people who are interested in exploring ideas. Exploring just for the sake of exploring or for a specific goal.
Strong interpretation:
I use the framework as a model of intelligence. I claim that any property of intelligence boils down to the "three levels of exploration". Any talent, any skill. The model is supposed to be "self-evident" because of its simplicity, it's not based on direct analysis of famous smart people.
Take the strong interpretation with a lot of grains of salt, of course, because I'm not an established thinker and I haven't achieved anything intellectual. I just thought "hey, this is a funny little simple idea, what if all intelligence works like this?", that's all.
That said, I'll need to make a couple of extraordinary claims "from inside the framework" (i.e. assuming it's 100% correct and 100% useful). Just because that's in the spirit of the idea. Just because it allows to explore the idea to its logical conclusion. Definitely not because I'm a crazy man. You can treat the most outlandish claims as sci-fi ideas.
Can you "reduce" thinking to a single formula? (Sounds like cringe and crackpottery!)
Can you show a single path of the best and fastest thinking?
Well, there's an entire class of ideas which attempt to do this in different fields, especially the first idea:
Bayesian epistemology: "epistemology in a single rule" (the rule of updating beliefs)
Utilitarianism, preference utilitarianism: "(meta-)ethics in a single rule"
Baconian method, the prototype of the scientific method: "science in a single rule"
Hegelian dialectic: "philosophy in a single process"
Marxist dialectic: "history in a single process"
My idea is just another attempt at reduction. You don't have to treat such attempts 100% seriously in order to find value in them. You don't have to agree with them.
Let's introduce my framework.
In any topic, there are three levels of exploration:
The point is that at the 2nd level you study similarities between different X directly, but at the 3rd level you study similarities indirectly through new concepts Y and D. The letter "D" means "dynamics".
I claim that any property of intelligence can be boiled down to your "exploration level". Any talent, any skill and even more vague things such as "level of intentionality". I claim that the best and most likely ideas come from the 3rd level. That 3rd level defines the absolute limit of currently conceivable ideas. So, it also indirectly defines the limit of possible/conceivable properties of reality.
You don't need to trust those extraordinary claims. If the 3rd level simply sounds interesting enough to you and you're ready to explore it, that's good enough.
A vague description of the three levels:
Or:
Or:
So yeah, it's a Hegelian dialectic rip-off. Down below are examples of applying my framework to different topics. You don't need to read them all, of course.
I think there are three levels of exploring arguments:
If you want to get a real insight about argumentation, you need to study how (D) arguments change/get changed by some new thing Y. D and Y need to be important even outside of the context of explicit argumentation.
For example, Y can be "concepts". And D can be "connecting/separating" (a fundamental process which is important in a ton of contexts). You can study in what ways arguments connect and separate concepts.
A simplified political example: a capitalist can tend to separate concepts ("bad things are caused by mistakes and bad actors"), while a socialist can tend to connect concepts ("bad things are caused by systemic problems"). Conflict Vs. Mistake^(1) is just a very particular version of this dynamic. Different manipulations with concepts create different arguments and different points of view. You can study all such dynamics. You can trace arguments back to fundamental concept manipulations. It's such a basic idea and yet nobody has done it. Aristotle has done it 2400 years ago, but for formal logic.
^(1. I don't agree with Scott Alexander, by the way.)
I think most of us are at the level 1 in argumentation: we throw arguments at each other like angry cavemen without studying what an "argument" is and/or what dynamics it creates. If you completely unironically think that "stupid arguments" exist, then you're probably on the 1st level. Professional philosophers are at the level 2 at best, but usually lower (they are surprisingly judgemental). At least they are somewhat forced to be tolerant to the most diverse types of arguments due to their profession.
On what level are you? Have you studied arguments without judgement?
I think there are three levels in understanding your opponent:
For example, Y can be "copies of the same thing" and D can be "transformations of copies into each other". Such Y and D are important even outside of debates.
So, on the 3rd level you may be able to describe the opponent's position as a weaker version/copy of your own position (Y) and clearly imagine how your position could turn out to be "the weaker version/copy" of the opponent's views. You can imagine how opponent's opinion transforms into truth and your opinion transforms into a falsehood (D).
Other interesting choices of Y and D are possible. For example, Y can be "complexity of the opinion [in a given context]"; D can be "choice of the context" and "increasing/decreasing of complexity". You can run the opinion of your opponent through different contexts and see how much it reacts to/accommodates the complexity of the world.
I think people very rarely do the 3rd level of empathy.
Doing it systematically would lead to a new political/epistemological paradigm.
I think there are three levels of studying the connection between beliefs and ontology:
What can D and Y be? Both things need to be important even outside of the context of explicit beliefs. A couple of versions:
Thinking at the level of semantic connections should be natural to people, because they use natural language and... neural nets in their brains! (Berkeley makes a similar argument: "hey, folks, this is just common sense!") And yet this idea is extremely alien to people epistemology-wise and ontology-wise. I think the true potential of the 3rd level remains unexplored.
I think most rationalists (Bayesians, LessWrong people) are "confused" between the 2nd level and the 1st level, even though they have some 3rd level tools.
Eliezer Yudkowsky is "confused" between the 1st level and the 3rd level: he likes level 1 ideas (e.g. "map is not the territory"), but has a bunch of level 3 ideas ("some maps are the territory") about math, probability, ethics, decision theory, Security Mindset...
I think there are three level of exploring the relationship between ontologies and reality:
Y can be "human minds" or simply "objects". D can be "matching/not matching" or "creating a structure" (two very basic, but generally important processes). You get Kant's "Copernican revolution" (reality needs to match your basic ontology, otherwise information won't reach your mind: but there are different types of "matching" and transcendental idealism defines one of the most complicated ones) and Ontic Structural Realism (ontology is not about things, it's about structures created by things) respectively.
On what level are you? Have you studied ontologies/epistemologies without judgement? What are the most interesting ontologies/epistemologies you can think of?
I think there are three levels of doing philosophy in general:
To give a bunch of examples, Y can be:
I think people did a lot of 3rd level philosophy, but we haven't fully committed to the 3rd level yet. We are used to treating philosophy as a closed system, even when we make significant steps outside of that paradigm.
I think there are three levels of values:
Real values. You treat your values as particular objects in reality.
Subjective values. You care only about things inside of your mind. For example, do you feel good or not?
Semantic values. You care about types of changes (D): how your values change/get changed by reality (Y). Your value can be expressed as a combination of the three components: "a real thing + its meaning + changes".
Example of a semantic value: you care about your friendship with someone. You will try to preserve the friendship. But in a limited way: you're ready that one day the relationship may end naturally (your value may "die" a natural death). Semantic values are temporal and path-dependent. Semantic values are like games embedded in reality: you want to win the game without breaking the rules.
I think there are three levels of analyzing ethics:
For example, Y can be "tasks, games, activities" and D can be "breaking/creating symmetries". You can study how norms and desires affect properties of particular activities.
Let's imagine an Artificial Intelligence or a genie who fulfills our requests (it's a "game" between us). We can analyze how bad actions of the genie can break important symmetries of the game. Let's say we asked it to make us a cup of coffee:
If it killed us after making the coffee, we can't continue the game. And we ended up with less than we had before. And we wouldn't make the request if we knew that's gonna happen. And the game can't be "reversed": the players are dead.
If it has taken us under mind control, we can't affect the game anymore (and it gained 100% control over the game). If it placed us into a delusion, then the state of the game can be arbitrarily affected (by dissolving the illusion). And depends on perspective.
If it made us addicted to coffee, we can't stop or change the game anymore. And the AI/genie drastically changed the nature of the game without our consent. It changed how the "coffee game" relates to all other games, skewed the "hierarchy of games".
Those are all "symmetry breaks". And such symmetry breaks are bad in most of the tasks.
With Categorical Imperative, Kant explored a different choice of Y and D. Now Y is "roles of people", "society" and "concepts"; D is "universalization" and "becoming incoherent/coherent" and other things.
If Y is "preferences" and D is "averaging", we get Preference utilitarianism. (Preferences are important even outside of ethics and "averaging" is important everywhere.) But this idea is too "low-level" to use in analysis of ethics.
However, if Y is "versions of an abstract preference" and D is "splitting a preference into versions" and "averaging", then we get a high-level analog of preference utilitarianism. For example, you can take an abstract value such as Bodily autonomy and try to analyze the entirety of human ethics as an average of versions (specifications) of this abstract value.
Preference utilitarianism reduces ethics to an average of micro-values, the idea above reduces ethics to an average of a macro-value.
So, what's the point of the 3rd level of analyzing ethics? The point is to find objective sub-structures in ethics where you can apply deduction to exclude the most "obviously awful" and "maximally controversial and irreversible" actions. The point is to "derive" ethics from much more broad topics, such as "meaningful games" and "meaningful tasks" and "coherence of concepts".
I think:
There are three levels of looking at properties of objects:
Inherent properties. You treat objects as having more or less inherent properties. E.g. "this person is inherently smart"
Meta-properties. You treat any property as universal. E.g. "anyone is smart under some definition of smartness"
Semantic properties. You treat properties only as relatively attached to objects. You focus on types of changes (D): how properties and their interpretations change/get changed by some other thing Y. You "reduce" properties to D and Y. E.g. "anyone can be a genius or a fool under certain important conditions" or "everyone is smart, but in a unique and important way"
I think there are three levels of commitment to experiences:
You're interested in particular experiences.
You want to explore all possible experiences.
You're interested in types of changes (D): how your experience changes/get changed by some other thing Y. D and Y need to be important even outside of experience.
So, on the 3rd level you care about interesting ways (D) in which experiences correspond to reality (Y).
I think there are three levels of investigating the connection between experience and morality:
For example, Y can be "[basic] properties of concepts" and D can be "matches / mismatches [between concepts and actions towards them]". You can study how experience affects properties of concepts which in turn bias actions. An example of such analysis: "loving a sentient being feels fundamentally different from eating a sandwich. food taste is something short and intense, but love can be eternal and calm. this difference helps to not treat other sentient beings as something disposable"
I think the existence of the 3rd level isn't acknowledged much. Most versions of moral sentimentalism are 2nd level at best. Epistemic Sentimentalism can be 3rd level in the best case.
I think there are three levels of [studying] patterns:
For example, Y can be "pieces of information" or "contexts": you can study how patterns get discarded or redefined (D) when new information gets revealed/new contexts get considered.
You can study patterns which are "objective", but exist only in a limited context. For example, think about your friend's bright personality (personality = a pattern). It's an "objective" pattern, and yet it exists only in a limited context: the pattern would dissolve if you compared your friend to all possible people. Or if you saw your friend in all possible situations they could end up in. Your friend's personality has some basis in reality (X), has a limited domain of existence (Y) and the potential for change (D).
I think there are three levels in the relationship between patterns and causality. I'm going to give examples about visual patterns:
You learn which patterns are impossible due to local causal processes. For example: "I'm unlikely to see a big tower made of eggs standing on top of each other". It's just not a stable situation due to very familiar laws of physics.
You learn statistical patterns (correlations) which can have almost nothing to do with causality. For example: "people like to wear grey shirts".
You learn types of changes (D): how patterns change/get changed by some other thing Y. D and Y need to be important even outside of (explicit) pattern analysis. And related to causality.
Y can be "basic properties of images" and "basic properties of patterns"; D can be "sharing properties" and "keeping the complexity the same". In simpler words:
On the 3rd level you learn patterns which have strong connections to other patterns and basic properties of images. You could say such patterns are created/prevented by "global" causal processes. For example: "I'm unlikely to see a place fully filled with dogs. dogs are not people or birds or insects, they don't create such crowds or hordes". This is very abstract, connects to other patterns and basic properties of images.
I think...
Suppose you want to study different cognitive processes, skills, types of knowledge. There are three levels:
You study particular cognitive processes.
You study types (qualities) of cognitive processes. And types of types (classifications).
You study types of changes (D): how cognitive processes change/get changed by some other thing Y. D and Y need to be important even without the context of cognitive processes.
For example, Y can be "fundamental configurations / fundamental objects" and D can be "finding a fundamental configuration/object in a given domain". You can "reduce" different cognitive process to those Y and D: (names of the processes below shouldn't be taken 100% literally)
^(1 "fundamental" means "VERY widespread in a certain domain")
I know, this looks "funny", but I think all this could be easily enough formalized. Isn't that a natural way to study types of reasoning? Just ask what knowledge a certain type of reasoning learns!
I think there are three ways of doing science:
You predict a specific phenomenon.
You study types of phenomena. (qualities of phenomena)
You study types of changes (D): how the phenomenon changes/get changed by some other thing Y. D and Y need to be important even outside of this phenomenon.
Imagine you want to explain combustion (why/how things burn):
So, I think phlogiston theory was a step in the right direction, but it failed because the choice of Y and D wasn't abstract enough.
I think most significant scientific breakthroughs require level 3 ideas. Partially "by definition": if a breakthrough is not "level 3", then it means it's contained in a (very) specific part of reality.
I think there are three ways of doing math:
You explore specific mathematical structures.
You explore types of mathematical structures. And types of types. And typologies. At this level you may get something like Category theory.
You study types of changes (D): how equations change/get changed by some other thing Y. D and Y need to be important even outside of (explicit) math.
Let's look at math through the lens of the 3rd level:
All concepts above are "3rd level". But we can classify them, creating new three levels of exploration (yes, this is recursion!). Let's do this. I think there are three levels of mathematico-philosophical concepts:
So, Calculus is really "the king of kings" and "the insight of insights". 3rd level of the 3rd level.
I would classify physico-philosophical concepts as follows:
Concepts that change the way movement affects itself. E.g. Net force, Wave mechanics, Huygens–Fresnel principle
Concepts that change the "meaning" of movement. E.g. the idea of reference frames (principles of relativity), curved spacetime (General Relativity), the idea of "physical fields" (classical electromagnetism), conservation laws and symmetries, predictability of physical systems.
Concepts that change the "essence" of movement, the way movement relates to basic logical categories. E.g. properties of physical laws and theories (Complementarity; AdS/CFT correspondence), the beginning/existence of movement (cosmogony, "why is there something rather than nothing?", Mathematical universe hypothesis), the relationship between movement and infinity (Supertasks) and computation/complexity, the way "possibility" spreads/gets created (Quantum mechanics, Anthropic principle), the way "relativity" gets created (Mach's principle), the absolute mismatch between perception and the true nature of reality (General Relativity, Quantum Mechanics), the nature of qualia and consciousness (Hard problem of consciousness), the possibility of Theory of everything and the question "how far can you take [ontological] reductionism?", the nature of causality and determinism, the existence of space and time and matter and their most basic properties, interpretation of physical theories (interpretations of quantum mechanics).
To define "meta ideas" we need to think about many pairs of "Y, D" simultaneously. This is the most speculative part of the post. Remember, you can treat those speculations simply as sci-fi ideas.
Each pair of abstract concepts (Y, D) defines a "language" for describing reality. And there's a meta-language which connects all those languages. Or rather there's many meta-languages. Each meta-language can be described by a pair of abstract concepts too (Y, D).
^(Instead of "languages" I could use the word "models". But I wanted to highlight that those "models" don't have to be formal in any way.)
I think the idea of "meta-languages" can be used to analyze:
According to the framework, ideas about "meta-languages" define the limit of conceivable ideas.
If you think about it, it's actually a quite trivial statement: "meta-models" (consisting of many normal models) is the limit of conceivable models. Your entire conscious mind is such "meta-model". If no model works for describing something, then a "meta-model" is your last resort. On one hand "meta-models" is a very trivial idea^(1), on another hand nobody ever cared to explore the full potential of the idea.
^(1 for example, we have a "meta-model" of physics: a combination of two wrong theories, General Relativity and Quantum Mechanics.)
I talked about qualia in general. Now I just want to throw out my idea about the nature of particular percepts.
There are theories and concepts which link percepts to "possible actions" and "intentions": see Affordance. I like such ideas, because I like to think about types of actions.
So I have a variation of this idea: I think that any percept gets created by an abstract dynamic (Y, D) or many abstract dynamics. Any (important) percept corresponds to a unique dynamic. I think abstract dynamics bind concepts.
^(But I have only started to think about this. I share it anyway because I think it follows from all the other ideas.)
Thank you for reading this.
If you want to discuss the idea, please focus on the idea itself and its particular applications. Or on exploring particular topics!
There's an alignment-related problem, the problem of defining real objects. Relevant topics: environmental goals; task identification problem; "look where I'm pointing, not at my finger"; The Pointers Problem; Eliciting Latent Knowledge.
I think I realized how people go from caring about sensory data to caring about real objects. But I need help with figuring out how to capitalize on the idea.
So... how do humans do it?
For example, imagine you're just looking at ducks swimming in a lake. You notice that ducks don't suddenly disappear from your vision (permanence), their movement is continuous (continuity) and they seem to move in a 3D space (3D space). All those patterns ("permanence", "continuity" and "3D space") are useful for predicting aspects of immediate sensory input. But all those patterns are also useful for developing deeper theories of reality, such as atomic theory of matter. Because you can imagine that atoms are small things which continuously move in 3D space, similar to ducks. (This image stops working as well when you get to Quantum Mechanics, but then aspects of QM feel less "real" and less relevant for defining object.) As a result, it's easy to see how the deeper model relates to surface-level patterns.
In other words: reality contains "real objects" to the extent to which deep models of reality are similar to (models of) basic patterns in our sensory input.
There's an alignment-related problem, the problem of defining real objects. Relevant topics: environmental goals; task identification problem; "look where I'm pointing, not at my finger"; Eliciting Latent Knowledge.
Another highly relevant post: The Pointers Problem.
Creating an inhumanly good model of a human is related to formulating their preferences. A model captures many possibilities and the way many hypothetical things are simulated in the training data. Thus it's a step towards eliminating path-dependence of particular life stories (and preferences they motivate), by considering these possibilities altogether. Even if some on the possible life stories interact with distortionary influences, others remain untouched, and so must continue deciding their own path, for there are no external influences there and they are the final authority for what counts as aiding them anyway.
Creating an inhumanly good model of a human is related to formulating their preferences.
How does this relate to my idea? I'm not talking about figuring out human preferences.
Thus it's a step towards eliminating path-dependence of particular life stories
What is "path-dependence of particular life stories"?
I think things (minds, physical objects, social phenomena) should be characterized by computations that they could simulate/incarnate.
Are there other ways to characterize objects? Feels like a very general (or even fully general) framework. I believe my idea can be framed like this, too.
Models or real objects or things capture something that is not literally present in the world. The world contains shadows of these things, and the most straightforward way of finding models is by looking at the shadows and learning from them. Hypotheses is another toy example.
One of the features of models/things seems to be how they capture the many possibilities of a system simultaneously, rather than isolated particular possibilities. So what I gestured at was that when considering models of humans, the real objects or models behind a human capture the many possibilities of the way that human could be, rather than only the actuality of how they actually are. And this seems useful for figuring out their preferences.
Path-dependence is the way outcomes depend on the path that was taken to reach them. A path-independent outcome is convergent, it's always the same destination regardless of the path that was taken. Human preferences seem to be path dependent on human timescales, growing up in Egypt may lead to a persistently different mindset from the same human growing up in Canada.
I see. But I'm not talking about figuring out human preferences, I'm talking about finding world-models in which real objects (such as "strawberries" or "chairs") can be identified. Sorry if it wasn't clear in my original message because I mentioned "caring".
Models or real objects or things capture something that is not literally present in the world. The world contains shadows of these things, and the most straightforward way of finding models is by looking at the shadows and learning from them.
You might need to specify what you mean a little bit.
The most straightforward way of finding a world-model is just predicting your sensory input. But then you're not guaranteed to get a model in which something corresponding to "real objects" can be easily identified. That's one of the main reasons why ELK is hard, I believe: in an arbitrary world-model, "Human Simulator" can be much simpler than "Direct Translator".
So how do humans get world-models in which something corresponding to "real objects" can be easily identified? My theory is in the original message. Note that the idea is not just "predict sensory input", it has an additional twist.
For some time I wanted to apply the idea of probabilistic thinking (used for predicting things) to describing things, making analogies between things. This is important because your hypotheses (predictions) depend on the way you see the world. If you could combine predicting and describing into a single process, you would unify cognition.
Fuzzy logic and fuzzy sets is one way to do it. The idea is that something can be partially true (e.g. "humans are ethical" is somewhat true) or partially belong to a class (e.g. a dog is somewhat like a human, but not 100%). Note that "fuzzy" and "probable" are different concepts. But fuzzy logic isn't enough to unify predicting and describing. Because it doesn't tell us much about how we should/could describe the world. No new ideas.
I have a different principle for unifying probability and description. Here it is:
Properties of objects aren't contained in specific objects. Instead, there's a common pool that contains all possible properties. Objects take their properties from this pool. But the pool isn't infinite. If one object takes 80% of a certain property from the pool, other objects can take only 20% of that property (e.g. "height"). Socialism for properties: it's not your "height", it's our "height".
How can an object "take away" properties of other objects? For example, how can a tall object "steal" height from other objects? Well, imagine there are multiple interpretations of each object. Interpretation of one object affects interpretation of all other objects. It's just a weird axiom. Like a Non-Euclidean geometry.
This sounds strange, but this connects probability and description. And this is new. I think this principle can be used in classification and argumentation. Before showing how to use it I want to explain it a little bit more with some analogies.
Imagine two houses, A and B. Those houses are connected in a specific way.
When one house turns on the light at 80%, the other turns on the light only at 20%.
When one house uses 60% of the heat, the other uses only 40% of the heat.
(When one house turns on the red light, the other turns on the blue light. When one house is burning, the other is freezing.)
Those houses take electricity and heat from a common pool. And this pool doesn't have infinite energy.
Usually people think about qualities as something binary: you either has it or not. For example, a person can be either kind or not.
For me an abstract property such as "kindness" is like the white light. Different people have different colors of "kindness" (blue kindness, green kindness...). Every person has kindness of some color. But nobody has all colors of kindness.
Abstract kindness is the common pool (of all ways to express it). Different people take different parts of that pool.
Theism analogy. You can compare the common pool of properties to the "God object", a perfect object. All other objects are just different parts of the perfect object. You also can check out Monadology by Gottfried Leibniz.
Spectrum analogy. You can compare the common pool of properties to the spectrum of colors. Objects are just colors of a single spectrum.
Ethics analogy. Imagine that all your good qualities also belong (to a degree) to all other people. And all bad qualities of other people also belong (to a degree) to you. As if people take their qualities from a single common pool.
Buddhism analogy. Imagine that all your desires and urges come (to a degree) from all other people. And desires and urges of all other people come (to a degree) from you. There's a single common pool of desire. This is somewhat similar to karma. In rationality there's also a concept of "values handshakes": when different beings decide to share each other's values.
Quantum analogy. See quantum entanglement. When particles become entangled, they take their properties from a single common pool (quantum state).
Fractal analogy. "All objects in the Universe are just different versions of a single object."
Subdivision analogy. Check out Finite subdivision rule. You can compare the initial polygone to the common pool of properties. And different objects are just pieces of that polygone.
Recursion. If objects take their properties from the common pool, it means they don't really have (separate) identities. It also means that a property (X) of an object is described in terms of all other objects. So, the property (X) is recursive, it calls itself to define itself.
For example, imagine we have objects A, B and C. We want to know their heights. In order to do this we may need to evaluate those functions:
A priori assumptions about objects should allow us to simplify this and avoid cycles.
Fractals. See Coastline paradox. You can treat a fractal as an object with multiple interpretations (where an interpretation depends on the scale). Objects taking their properties from the common pool = fractals taking different scales from the common range.
To explain how to classify objects using my principle, I need to explain how to order them with it.
I'll explain it using fantastical places and videogame levels, because those things are formal and objective enough (they are 3D shapes). But I believe the same classification method can be applied to any objects, concepts and even experiences.
Basically, this is an unusual model of contextual thinking. If we can formalize this specific type of contextual thinking, then maybe we can formalize contextual thinking in general. This topic will sound very esoteric, but it's the direct application of the principle explained above.
(I interpret paintings as "real places": something that can be modeled as a 3D shape. If a painting is surreal, I simplify it a bit in my mind.)
Take a look at those places: image.
Let's compare 2 of them: image. Let's say we want to know the "height" of those places. We don't have a universal scale to compare the places. Different interpretations of the height are possible.
If we're calling a place "very tall" - we need to understand the epithet "very tall" in probabilistic terms, such as "70-90% tall" - and we need to imagine that this probability is taken away from all other places. We can't have two different "very tall" places. Probability should add up to 100%.
Now take a look at another place (A): image (I ignore the cosmos to simplify it). Let's say we want to know how enclosed it is. In one interpretation, it is massively enclosed by trees. In another interpretation, trees are just a decorative detail and can be ignored. Let's add some more places for context: image. They are definitely more open than the initial place, so we should update towards more enclosed interpretation of (A). All interpretations should be correlated and "compatible". It's as if we're solving a puzzle.
You can say that properties of places are "expandable". Any place contains a seed of any possible property and that seed can be expanded by a context. "Very tall place" may mean Mt. Everest or a molehill depending on context. You can compare it to a fractal: every small piece of a fractal can be expanded into the entire thing. And I think it's also very similar to how human language, human concepts work.
You also may call it "amplification of evidence": any smallest piece of evidence (or even absence of any evidence) can be expanded into very strong evidence by context. We have a situation like in the Raven paradox, but even worse.
(I interpret paintings as "real" places.)
Places in random order: image.
My ordering of places: image.
I used 2 metrics to evaluate the places:
The places go from "box-like and enclosed" to "not box-like and open" in my ordering.
But to see this you need to look at the places in a certain way, reason about them in a certain way:
Almost any property of any specific place can be "illusory". But when you look at places in the context you can deduce their properties vie the process of elimination.
You can apply the same idea (about the "common pool") to hypotheses and argumentation:
In a way it means that specific hypotheses/beliefs just don't exist, they're melted into a single landscape. It may sound insane ("everything is true at the same time and never proven wrong" and also relative!). But human language, emotions, learning, pattern-matching and research programs often work like this. It's just a consequence of ideas (1) not being atomic statements about the world and (2) not being focused on causal reasoning, causal modeling. And it's rational to not start with atomic predictions when you don't have enough evidence to locate atomic hypotheses.
You can split rationality into 2 components. The second component isn't explored. My idea describes the second component:
Causal and Descriptive rationality work according to different rules. Causal uses Bayesian updating. Descriptive uses "the common pool of properties + Bayesian updating", maybe.
Example: Vitalism. It was proven wrong in causal terms. But in descriptive terms it's almost entirely true. Living matter does behave very differently from non-living matter. Living matter does have a "force" that non-living matter doesn't have (it's just not a fundamental force). Many truths of vitalism were simply split into different branches of science: living matter is made out of special components (biology/microbiology) including nanomachines/computers!!! (DNA, genetics), can have cognition (psychology/neuroscience), can be a computer (computer science), can evolve (evolutionary biology), can do something like "decreasing entropy" (an idea by Erwin Schrödinger, see entropy and life). On the other hand, maybe it's bad that vitalism got split into so many different pieces. Maybe it's bad that vitalism failed to predict reductionism. However, behaviorism did get overshadowed by cognitive science (living matter did turn out to be more special than it could be). Our judgement of vitalism depends on our choices, but at worst vitalism is just the second best idea. Or the third best idea compared to some other version of itself... Absolute death of vitalism is astronomically unlikely and it would cause most of reductionism and causality to die too along with most of our knowledge about the world. Vitalism partially just restates our knowledge ("living matter is different from non-living"), so it's strange to simply call it wrong. It's easier to make vitalism better than to disprove it.
Perhaps you could call the old version of vitalism "too specific given the information about the world": why should "life-like force" be beyond laws of physics? But even this would be debatable at the time. By the way, the old sentiment "Science is too weak to explain living things" can be considered partially confirmed: 19th century science lacked a bunch of conceptual breakthroughs. And "only organisms can make the components of living things" is partially just a fact of reality: skin and meat don't randomly appear in nature. This fact was partially weakened, but also partially strengthened with time. The discovery of DNA strengthened it in some ways. It's easy to overlook all of those things.
In Descriptive rationality, an idea is like a river. You can split it, but you can't stop it. And it doesn't make sense to fight the river with your fists: just let it flow around you. However, if you did manage to split the river into independent atoms, you get Causal rationality.
I think causal rationality has some problems and those problems show that it has a missing component:
I'm not saying that all of this is impossible to solve with Causal rationality. I'm saying that Causal rationality doesn't give any motivation to solve all of this. When you're trying to solve it without motivation you kind of don't know what you're doing. It's like trying to write a program in bytecode without having high-level concepts even in your mind. Or like trying to ride an alien device in the dark: you don't know what you're doing and you don't know where you're doing.
What and where are we doing when we're trying to fix rationality?
Crash Bandicoot N. Sane Trilogy
My ordering of some levels: image. Videos of the levels: Level 1, Level 2, Level 3, Level 4, Level 5, Level 6.
I used 2 metrics to evaluate the levels:
The levels go from "vertical and separable" to "horizontal and not separable".
But to see this you need to note:
Any question about any property of any level is answered by another question: is this property already "occupied" by some other level?
Places in random order: image.
My ordering of places: image.
I used 2 metrics to evaluate the places:
The places go from "box-like and outside" to "not box-like and inside".
But to see this you need to note:
If you feel this relativity of places' properties, then you understand how I think about places. You don't need to understand a specific order of places perfectly.
My ordering of some levels: image. Videos of the levels: Level 1, Level 2, Level 3, Level 4, Level 5, Level 6, Level 7
I used 1 metrics to evaluate the levels:
Levels go from 3D to 2D to 0D.
But to see this you need to note:
Each level is described by all other levels. This recursive logic determines what features of the levels matter.
When objects take their properties from a single pool of properties, there may appear "negative objects". It happens when objects A and B take away opposite properties from a third object C (with equal force). For example, A may take away height from C. But B takes away shortness (anti-height) from C. So, "negative objects" are like contradictions. You can't fit a negative object anywhere in the order of positive objects.
Let's get back to Crash Bandicoot 3 and add two levels: image. Videos of the levels: Level -2, Level -1
Note that negative levels are still connected with all the other levels anyway: their properties are still determined by properties of all other levels, just in a more complicated way.
You can order negative levels by using the metrics for positive levels. In the case above, you can do it like this:
There are also "hyper objects" (hyper positive and hyper negative objects). Such objects take "too much" or "too little" from the common pool of properties compared to normal objects.
How do hyper objects appear? I may not be able to explain it. Maybe a hyper object appears when an object takes a property (equally strong) from objects with very different amounts of that property. This was very confusing and vague, so here's an analogy: imagine a number that's very-very, but equally far away from the numbers 2 and 5. It has distance 10 from both 2 and 5. How can this be? This number should go somewhere "sideways"... it must be a complex number. So, you can compare hyper objects to complex numbers.
An example of hyper levels for Crash Bandicoot 3: image. Video of the levels: "Bye Bye Blimps", "N. Gin"
You may be asking "How can ordering things be related to anything?" Prepare for a little bit abstract argument.
Any thought/experience is about multiple things coexisting in your mental state. So, any thought/experience is about direct or indirect comparison between things. And any comparison can be described by an order or multiple orders.
So, "my orders + arithmetic orders" is something like a Turing machine: a universal model that can describe any thought/experience, any mental state. Of course, a Turing machine can describe anything my method can describe, but my method is more high-level.
I know that what I described above doesn't automatically specify a mathematical model. But I think we should be able to formalize my idea easily enough. If not, then my idea is wrong.
We have those hints for formalization:
To be honest, I'm bad at math. I based my theory on synesthesia-like experiences and conceptual ideas. But if the information above isn't enough, I can try to give more. I have experience of making my idea more specific, so I could guess how to make the idea even more specific (if we encounter a problem). Please, help me with formalizing this idea.
"Everything is relative."
You know this phrase, right? But "relativity" is relative too. Maybe something is absolute.
But "relativity of relativity" is relative too. Maybe nothing is absolute after all... Those thoughts create an infinite tower of meta-levels.
If you think about the statement "truth = lie" ("you can go from T to F") you can get a similar tower. (Because it also implies "you can NOT go from T to F" and "you can go from "you can NOT go from T to F" to "you can go from T to F"" and so on.) It's not formal, but still interesting. Informally, the statement "truth = lie" is equivalent to "everything is relative".
Hierarchy of meta-levels is relative.
Imagine an idealist and a materialist. Materialist thinks "I'm meta compared to the idealist - I can analyze their thought process through physics". Idealist thinks "materialist thinks they're meta compared to me, but thinking in terms of physics is just one possible experience". So, "my thought process = the most important thing" and "my thought process + physics = the most important things" are both meta- compared to each other, they both can do meta-analysis of each other.
Both materialism and idealism can model each other. Materialism can be modeled by meta-idealism. Meta-idealism can be modeled by meta-materialism. Meta-materialism can be modeled by meta-meta-idealism. And so on. (Those don't have to be different models, it's just convenient to think about it in terms of levels.)
The same thing with altruism and selfishness. Altruism can be modeled by meta-selfishness. Meta-selfishness can be modeled by meta-meta-altruism. And you can abstract it to any property (A) and its negation (not A), because any property can be treated as a model of the world. So, this idea can be generalized as "A = not A".
Points and lines
Next step of the idea: for meta-level objects lower level objects are indistinguishable.
If you think in terms of points, two different points (A and B) are different objects to you. If you think in terms of lines, then points A and B may be parts of the same object. Or, on the other hand, the same point can be a part of completely different objects.
A universe of objects
Now imagine that some points are red and other points are blue. And we don't care about the shape of a line.
Level-1 lines contain only blue (positive) or only red (negative) points.
Level-2 lines can contain both types of points. E.g. they can contain mostly blue (complex positive) or mostly red (complex negative) points.
So, you can get different kinds of objects out of this, somewhat similar to numbers. I guess you can do this in many different ways. For example, you may have a spectrum of colors. Or you may have a positive and negative spectrums. To me it's very important, because it connects to my synesthesia: see here. The post is very unclear (don't advice reading it), but sadly I don't know how to explain everything better yet.
Perhaps the alternative to the maximize one thing subject to a price to humans constraint would be not making the AI that specialized. Make it maximize across a basket of things humans want.
While I have heard the paper clips take over the universe worry it seems to be that type of thought experiment introduce the problem to begin with (making a bit of a circular type error). As I gather (indirectly) the problem is the paper clip maximizing AI end up taking over the entire economy. That seems to equivalent to suggesting the AI replaces all the markets and other economic decisions (being smarter, faster and more competitive I guess).
If so isn't an obvious solution to give it multiple (infinite in the sense of unlimited human wants) things to maximize? While it might replace the human production economic activity it's going to produce some form a current state production possibility frontier and, I would think, an inter temporal one as well that that might address some inter-generational concerns.
I don't think that fully solves the alignment problem (as I understand it -- possibly poorly) but I do think it shifts what the risks are and may well eliminate a lot of the existential risks people worry about.
I want to share a way of dissolving disagreements. It's also a style of thinking. I call it "the method of statements", here's the description:
Take an idea, theory or argument. Split it into statements of a certain type. (Or multiple types.)
Evaluate the properties of the statements. Do they exist (i.e. can they be defined, does anything connect them)? Can they be used, are they constructive? Are they simple? Are they important? Etc.
Try to extract as much information as possible from those statements.
One rule:
When you evaluate an argument with the method of statements, you don't evaluate the "logic" of the argument or its "model of the world". You evaluate properties of statements implied by the argument. Do statements in question correlate with something true or interesting?
You also may apply the method to analyzing information. You may split the information about something into statements of a certain type and study the properties of those statements. I can't define what a "statement" is. It's the most basic concept. Sometimes "statements" are facts, but not always. "Statements" may even be non-verbal. A set of statements can be defined in any way possible.
I will give a couple of less controversial (for rationalists) examples of applying the method. Then a couple of more controversial examples. And then share a couple of my own ideas in the context of the method. But before this...
There's a technique called "rationalist taboo". Imagine a disagreement about this question:
If a tree falls in a forest and no one is around to hear it, does it make a sound? (on wikipedia)
We may try to resolve the disagreement by trying to replace the label "sound" with its more specific contents. Are we talking about sound waves, the vibration of atoms? Are we talking about the subjective experience of sound? Are we talking about mathematical models and hypothetical imaginary situations? An important point is that we don't try to define what "sound" is, because it would only lead to a dispute about definitions.
The method of statements is somewhat similar to taboo. But with the method we "taboo" ideas and arguments themselves. We take an idea and replace it with its more specific, more atomic semantic contents. We take a thought and split it into smaller thoughts. It's "taboo" applied on a different level, a "meta-taboo" applied to the process of thinking itself.
However, rationalist taboo and the method of statements may be in direct conflict. Because rationalist taboo assumes that a "statement" is meaningless if it can't be formalized or expressed in a particular epistemology.
This part of the post is about reception of two LessWrong ideas/topics.
What is Logical Decision Theory (LDT)? You can check out "An Introduction to Logical Decision Theory for Everyone Else"
With the usual way of thinking, even if a person is sympathetic enough to LDT they may react like this:
I think the idea of LDT is important, but...
I'm not sure it can be formalized (finished).
I'm not sure I agree with it. It seems to violate such and such principles.
You can get the same results by fixing old decision theories.
Conclusion: "LDT brings up important things, but it's nothing serious right now".
(The reaction above is inspired by criticisms of William David MacAskill and Prof. Wolfgang Schwarz)
With the method of statements, being sympathetic enough to LDT automatically entails this (or even more positive) reaction:
There exist two famous types of statements: "causal statements" (in CDT) and "evidential statements" (in EDT). LDT hypothesizes the third type, "logical statements". The latter statements definitely exist. They can be used in thinking, i.e. they are constructive enough. They are simple enough ("conceptual"). And they are important enough. This already makes LDT a very important thing. Even if you can't formalize it, even if you can't make a "pure" LDT.
Logical statements (A) in decision theory are related to another type of logical statements (B): statements about logical uncertainty. We have to deal with the latter ones even without LDT. Logical statements (A) are also similar to more established albeit not mainstream "superrational statements" (see Superrationality).
Logical statements can be translated into other types of statements. But this doesn't justify avoiding to talk about them.
Conclusion: "LDT should be important, whatever complications it has".
The method of statements dissolves a number of things:
It dissolves counter-arguments about formalization: "logical statements" either exist or don't, and if they do they carry useful information. It doesn't matter if they can be formalized or not.
It dissolves minor disagreements. "Logical statements" either can or can't be true. If they can there's nothing to "disagree" about. And true statements can't violate any (important) principles.
Some logical suggestions do seem weird and unintuitive at first. But this weirdness may dissolve when you notice that those suggestions are properties of simple statements. If those statements can be true, then there's nothing weird about the suggestions. At the end of the day, we don't even have to follow the suggestions while agreeing that the statements are true and important. Statements are sources of information, nothing more and nothing less.
I think the usual way of thinking may be very reasonable but, ultimately, it's irrational, because it prompts unjust comparisons of ideas and favoring ideas which look more familiar and easier to understand/implement in the short run. With the usual way of thinking it's very easy to approach something in the wrong way and "miss the point".
If you want to describe human values, you can use three fundamental types of statements (and mixes between the types). Maybe there're more types, but I know only those three:
Any of those types can describe unaligned values. So, any type of those statements still needs to be "charged" with values of humanity. I call a statement "true" if it's true for humans.
We need to find the statement type with the best properties. Then we need to (1) find a language for this type of statements (2) encode some true statements and/or describe a method of finding "true" statements. If we've succeeded we solved the Alignment problem.
I believe X statements have the best properties, but their existence is almost entirely ignored in Alignment field.
I want to show the difference between the statement types. Imagine we ask an Aligned AI: "if human asked you to make paperclips, would you kill the human? Why not?" Possible answers with different statement types:
X statements have those better properties compared to other statement types:
I can't define human values, but I believe values exist. The same way I believe X statements exist, even though I can't define them.
I think existence of X statements is even harder to deny than existence of value statements. (Do you want to deny that you can make statements about general properties of systems and tasks?) But you can try to deny their properties.
X statements are almost entirely ignored in the field (I believe), but not completely ignored.
Impact measures ("affecting the world too much is bad", "taking too much control is bad") are X statements. But they're a very specific subtype of X statements.
Normativity (by abramdemski) is a mix between value statements and X statements. But statements about normativity lack most of the good properties of X statements. They're too similar to value statements.
I want to discuss a particular failure mode of communication and thinking in general. I think it affects our thinking about AI Alignment too.
Communication. A person has a vague, but useful idea (P). This idea is applicable on one level of the problem. It sounds similar to another idea (T), applicable on a very different level of the problem. Because of the similarity nobody can understand the difference between (P) and (T). People end up overestimating the vagueness of (P) and not considering it. Because people aren't used to mapping ideas to "levels" of a problem. Information that has to give more clarity (P is similar to T) ends up creating more confusion. I think this is irrational, it's a failure of dealing with information.
Thinking in general. A person has a specific idea (T) applicable on one level of a problem. The person doesn't try to apply a version of this idea on a different level. Because (1) she isn't used to it (2) she considers only very specific ideas, but she can't come up with a specific idea for other levels. I think this is irrational: rationalists shouldn't shy away from vague ideas and evidence. It's a predictable way to lose.
A comical example of this effect:
B can't understand A, because B thinks about the problem on the level of "chemical reactions". On that level it doesn't matter what heats the food, so it's hard to tell the difference between exploding the oven and using the oven in other ways.
Bad news is that "taboo technique" (replacing a concept with its components: "unpacking" a concept) may fail to help. Because A doesn't know the exact way to turn on the oven or the exact way the oven heats the food. Her idea is very useful if you try it, but it doesn't come with a set of specific steps.
And the worst thing is that A may not be there in the first place. There may be no one around to even bother you to try to use your oven differently.
I think rationality doesn't have a general cure for this, but this may actually be one of the most important problems of human reasoning. I think the entire human knowledge is diseased with this. Our knowledge is worse than swiss cheese and we don't even try to fill the gaps.
Any good idea that was misunderstood and forgotten - was forgotten because of this. Any good argument that was ignored and ridiculed - was ignored because of this. It all got lost in the gaps.
I think one method to resolve misunderstanding is to add some metrics for comparing ideas. Then talk about something akin to probability distributions over those metrics. A could say:
""Instruments have parts with different functions. Those functions are not the same, even though they may intersect and be formulated in terms of each other:
In practice, some parts of the instrument realize both functions. E.g. the handle of a hammer actually allows you not only to control the hammer, but also to speed up the hammer more effectively.
When we blow up the oven, we use 99% of the first function of the oven. But I believe we can use 80% of the second function and 20% of the first.""
Let's explore some ideas to learn to attach ideas to "levels" of a problem and seek "gaps". "(gap)" means that the author didn't consider/didn't write about that idea.
Two of those ideas are from math. Maybe I shouldn't have used them as examples, but I wanted to give diverse examples.
(1) "Expected Creative Surprises" by Eliezer Yudkowsky. There are two types of predictability:
Sometimes they're the same thing. But sometimes you have:
(2) "Belief in Belief" by Eliezer Yudkowsky. Beliefs exist on three levels:
Sometimes a belief exists on all those levels and contents of the belief are the same on all levels. But sometimes you get more interesting types of beliefs, for example:
(3) "The Real Butterfly Effect", explained by Sabine Hossenfelder. There're two ways in which consequences of an event spread:
In a way it's kind of the same thing. But in a way it's not:
(4) "P=NP, relativisation, and multiple choice exams", Baker-Gill-Solovay theorem explained by Terence Tao. There are two dodgy things:
Sometimes they are "the same thing", sometimes they are not.
(5) "Free Will and consciousness experience are a special type of illusion." An idea of Daniel Dennett. There are 2 types of illusions:
Conscious experience is an illusion of the second type, Dennett says. I don't agree, but I like the idea and think it's very important.
Somewhat similar to Fictionalism: there are lies and there are "truths of fiction", "useful lies". Mathematical facts may be the same type of facts as "Macbeth is insane/Macbeth dies".
(6) "Tlön, Uqbar, Orbis Tertius" by Jorge Luis Borges. A language has two functions:
Different languages can have different focus on those functions:
I think there's an important gap in Borges's ideas: Borges doesn't consider a language with extremely strong, but not absolute emphasis on the second function. Borges criticizes his languages, but doesn't steelman them.
(7) "Pierre Menard, Author of the Quixote" by Jorge Luis Borges. There are 3 ways to copy a text:
Pierre Menard wants to copy 1% of the 1 and 98% of the 2 and 1% of the 3: Pierre Menard wants to imagine exactly the same text but with completely different thoughts behind the words.
("gap") Pierre Menard also could try to go for 100% of 3 and for "anti 99%" of 4: try to write a completely new text by experiencing the same thoughts and urges that created the old one.
You can use the same thinking to analyze/classify puzzles.
Inspired by Pirates of the Caribbean: Dead Man's Chest. Jack has a compass that can lead him to a thing he desires. Jack wants to find a key. Jack can have those experiences:
In order for compass to work Jack may need (almost) any mix of those: for example, maybe pure desire is enough for the compass to work. But maybe you need to mix pure desire with seeing at least a drawing of the key (so you have more of a picture of what you want).
Jack has those possibilities:
Gibbs thinks about doing 100% of 1 or 100% of 2 and gets confused when he learns that's not the plan. Jack thinks about 50% of 1 and 50% of 2: you can go after the chest in order to use it to get the key. Or you can go after the chest and the key "simultaneously" in order to keep Davy Jones distracted and torn between two things.
Braid, Puzzle 1 ("The Ground Beneath Her Feet"). You have two options:
You need 50% of 1 and 50% of 2: first you ignore the platform, then you move the platform... and rewind time to mix the options.
Braid, Puzzle 2 ("A Tingling"). You have the same two options:
Now you need 50% of 1 and 25% of 2: you need to rewind time while the platform moves. In this time-manipulating world outcomes may not add up to 100% since you can erase or multiply some of the outcomes/move outcomes from one timeline to another.
You can use the same thing to analyze arguments and opinions. Our opinions are built upon thousands and thousands "false dilemmas" that we haven't carefully revised.
For example, take a look at those contradicting opinions:
Usually people think you have to believe either "100% for 1" or "100% for 2". But you can believe in all kinds of mixes.
For example, I believe in 90% of 1 and 10% of 2: people may be "stupid" in this particular nonsensical world, but in a better world everyone would be a genius.
You can treat an idea as a "(quasi)probability distribution" over some levels of a problem/topic. Each detail of the idea gives you a hint about the shape of the distribution. (Each detail is a bit of information.)
We usually don't analyze information like this. Instead of cautiously updating our understanding with every detail of an idea we do this:
Note: maybe you can apply the same idea about "bits" to chess (and other games). Each idea and each small advantage you need to come with the winning plan is a "bit" of information/advantage. Before you get enough information/advantage bits the positions looks like a cloud where you don't see what to do.
I think you can measure "richness" of theories (and opinions and anything else) using the same quasiprobabilities/bits. But this measure depends on what you want.
Compare those 2 theories explaining different properties of objects:
Let's add a metric to compare 2 theories:
Let's say we're interested in physical objects. B-theory explains properties through 90% of 1 and 10% of 2: it makes properties of objects equivalent to the reason of their existence. A-theory explains properties through 100% of 2. B-theory is more fundamental, because it touches more on a more fundamental topic (existence).
But if we're interested in mental objects... B-theory explains only 10% of 2 and 0% of 1. And A-theory may be explaining 99% of 1. If our interests are different A-theory turns out to be more fundamental.
When you look for a theory (or opinion or anything else), you can treat any desire and argument as a "bit" that updates the quasiprobabilities like the ones above.
We could help each other to find gaps in our thinking! We could do this in this thread.
I want to explain what I perceive as missed ideas in Alignment. And discuss some other ideas.
(1) You can split possible effects of AI's actions into three domains. All of them are different (with different ideas), even though they partially intersect and can be formulated in terms of each other. Traditionally we focus on the first two domains:
I think third domain is mostly ignored and it's a big blind spot.
I believe that "human (meta-)ethics" is just a subset of a way broader topic: "properties of (any) systems". And we can translate the method of learning properties of simple systems into a method of learning human values (a complicated system). And we can translate results of learning those simple systems into human moral rules. And many important complicated properties (such as "corrigibility") has analogies in simple systems.
(2) Another "missed idea":
"True Love(TM) towards a sentient being" feels fundamentally different from "eating a sandwich", so it could be evidence that human experiences have an internal structure and that structure plays a big role in determining values. But not a lot of models (or simply 0) take this "fact" into account. Not surprisingly, though: it would require a theory of human subjective experience. But still, can we just ignore this "fact"?
(3) Preference utilitarianism says:
I think there's a missed idea: you could try to describe entire ethics by a weighted aggregation of a single... macroscopic value.
(4) Connectionism and Connectivism. I think this is a good example of a gap in our knowledge:
I think one layer of the idea is missing: you could say that concepts in the human mind are somewhat like neurons. Maybe human thinking is like a fractal, looks the same on all levels.
(5) Bayesian probability. There's an idea:
I think this idea should have a "counterpart": maybe you can describe macroscopic things in terms of each other. And not only outcomes. Using something somewhat similar to probabilistic reasoning, to Bayes' rule.
That's what I tried to do in this post.
(Drafts of a future post.) I want to confront/explain my optimism. Here's a thought experiment to explain what "optimism" means to me:
Imagine a world like Earth. There's an underground prison. People live there for generations. The prison is constructed in such a way that you can live there "forever". People are not aware that the world outside of the prison exists.
One person in the prison imagines freedom. But she doesn't have evidence (or so it seems).
...
...
...
What is the prison of our world? I can think of 11 prisons. Those are 11 main factors that (try to) limit my optimism.
Prison of Death. Humans die. However, this fact doesn't imprison the entire humanity. And death isn't logically necessary.
Prison of Pain. People experience pain. And if we didn't, the pain would still exist as a concept, there would be a way to create pain (maybe). This fact doesn't imprison entire humanity and pain isn't necessary.
Prison of Experience. Your experience doesn't matter. 99% of your experience doesn't give you knowledge (power), doesn't let you help anybody and barely matters in the culture. A couple of math theorems are "more important" (give more power, better remembered int he culture) than 50 years of someone's suffering.
Prison of Communication. This is one of the prisons from the thought experiment. I can't communicate my ideas. I can't communicate the value I feel. If I turn out to be wrong, if I fail, I'll never be able to tell my story and why I believed what I believed. I'll never share the way I saw the world. And I'll never know the "true", non-generic reason of my failure.
Prison of Complexity. Human type of thinking can exist only on a certain level of computational power.
Prison of Inequality. The possibility that people are not "equal" in some important aspect. This is a pessimistic thing because it contradicts the concept of personality: why are people different if difference is bad? And if difference is good, then inequality won't let us notice this anyway.
Prison of Badness. Humans are born to think, but human thinking is the baddest and most egoistic and broken thing ever (compared to Bayes and "shut up and multiply").
...
Here're some more, it feels as if they have a different flavor:
Prison of Impossible Problems. Humanity is bound to face "unsolvable" problems. Unable to "solve extinction" in time.
Prison of Time/Opportunities. You don't have the time and opportunities to develop your potential. And to experience everything you need.
Prison of Free Will. We don't have free will.
Prison of Afterlife/God. There's no afterlife and no God.
Imagine the classic paperclip maximizer thought experiment. We say AGI to make paperclips - AGI uses all matter of the Universe for it.
But now imagine a different version: we say AGI to "make paperclips that cost 1¢ (in human economy)". Now killing everyone isn't a solution: destroying humanity would destroy the economy.
Isn't it an interesting version of the thought experiment? Of course, everything can/will go wrong anyway, but maybe in a way funnier and more convoluted way. More funny and convoluted than "maximize human smiles", for example. Because AGI needs to take into account effects of a system (economic system), not just fulfill some fixed conditions.
I first mentioned the idea in this comment, a couple of people disagreed.
we say AGI to "make paperclips that cost 1¢ (in human economy)". Now killing everyone isn't a solution: destroying humanity would destroy the economy.
Seems to collapse easily: how does the AI decide what costs $0.01, exactly? Does it use the last price of a transaction on a market, and is doing mark-to-market? Well... among many other problems that occur to me, the most immediate one is that the price can't change if there aren't any more transactions, now can it. Nothing about 'make paperclips that would cost $0.01' would seem to rule out market manipulation, monopolization, or destruction. No market, no changes in the price you are marking to, no risk or volatility, no crash in prices due to oversupply, and enables efficient planning for the future and maximizing production of paperclips that would have cost $0.01 on what used to be Earth.
(The humorous fictional version of this story would involve 2 of the last survivors locating a sensor of the AI, building a large hollow paperclip-shaped human habitat, and loudly and ostentatiously in front of the sensor, having the 'owner' sell it to the other survivor for exactly $0.01, and then buying a regular paperclip for the smallest amount they can write on an IOU using Knuth up notation, thereby establishing new market prices.)
I think destruction of the market should be ruled out easily. Say paperclips have to have this value on an active market.
For manipulation, monopolization and "kill almost everyone and leave just a small market of 2 last survivors"... I have to make a post about this. I have a deeper idea (maybe) behind it than this particular example.
My general idea is this: I think when you hook up AI's rewards to a system that has to have certain properties, it leads to interesting effects and implications for Alignment. Because now the AI needs to care both about its rewards and also about the properties of the reward system. Many Alignment ideas implicitly try to achieve this anyway.
Instead of explaining "monopolization is bad" (complicated and specific fact) you need to explain "100% controlling your own reward system is bad" (easier and more universal fact).
The humorous fictional version of this story would involve 2 of the last survivors locating a sensor of the AI, building a large hollow paperclip-shaped human habitat
I think some outcomes of paperclip maximization are qualitatively different from "everyone dies", even if they're still very bad. The outcomes in which AI has to leave at least some freedom/autonomy for humans (or some other system) are especially different. I think this is underexplored.
I think reformulating Alignment problem as "reward system control" problem at worst allows you to formulate all the same problems with a new angle and at best gives useful insight about the solution.
Say paperclips have to have this value on an active market.
Defining 'active market' sounds quite difficult. Is any kind of software-mediated trading, as opposed to humans thrusting arms into the air, like HFT trading of stocks, an 'active market'? Then fine, the AI creates agents which just wash-trades assets. (Better yet, it uses combinatorial markets to ensure bids/asks only execute that leave the price exactly the same or other such properties minimized/maximized/stabilized.)
To take a step back: do you see a potential conceptual distinction between my idea and classic paperclip maximization? (Of course, you don't have to see it and/or agree that there's one. And even if there's one in theory it doesn't mean it exists in practice.)
Yes, it's always hard to define the "true reward" AI should strive for. But properties of the system "true reward + AI" may be easier to define.
Then fine, the AI creates agents which just wash-trades assets.
If AI is able to reason/learn about properties of reward systems, then AI should be able to infer that taking 100% control over the reward system is a hack. Not something that can possibly be asked. So hacking the economy isn't just a solution "human doesn't expect" (some such solutions are very good), it's a solution that can't possibly be asked. This is one of the points of my idea: to introduce a distinction between unexpected solutions and nonsensical solutions.
do you see a potential conceptual distinction between my idea and classic paperclip maximization?
No. Not without a lot more work, because markets, evolution, gradient descent, Bayesian inference, and logical inference/prediction markets all have various isomorphisms and formal identities, which can make their 'differences' more a matter of nominalist preference, notation, and emphasis than necessarily any genuine conceptual distinction. You can define AIs which are quite explicitly architected as 'markets' of various sorts, like the 'Hayek machine' or the 'neural bucket brigade', or interpret them as natural selection if you prefer on agents with log utility (evolutionary finance), and so on; are those "markets", which can trade paperclips? Sure, why not.
Thank you for taking the time to answer!
I see that I need a post to at least explain myself. On the other hand, I worry to post too soon (maybe it's better to discuss something beforehand?). For the moment I decided to post this comment. I know, it's not formal, but I wanted to show what type of AI thinking I have in mind. And sorry for an annoying semantic nitpick ahead.
Not without a lot more work, because markets, evolution, gradient descent, Bayesian inference, and logical inference/prediction markets all have various isomorphisms and formal identities, which can make their 'differences' more a matter of nominalist preference, notation, and emphasis than necessarily any genuine conceptual distinction.
I think we can use 2 metrics to compare those ideas:
My idea is 80% about (1) and 20% about (2). Gradient descent is 100% about (2). Evolution, Bayesian inference and prediction markets are 100% about (2).
Because of this I feel like there's only 20% chance those ideas are equivalent/there's only 20% equivalence between them.
So, I feel like those ideas are different enough: "an AI that works like a market" and "an AI that seeks markets in the world and analyzes their properties".
(Drafts of a future post.)
Disclaimer: Of course, I don't ever mean that we shouldn't be worried about Alignment. I'm just trying to suggest new ways to think about values.
You (Q) visit a small town and have a conversation with one of the residents (A).
A smashes a bug.
Conclusion of the conversation:
You can treat a value as a membrane, a boundary. Defining a value means defining the granularity of this value. Then you just need to make sure that the boundary doesn't break, that the granularity doesn't become too high (value destroys itself) or too low (value gets "eaten"). Granularity of a value = "level" of a value. Instead of trying to define a value in absolute terms as an objective state of the world (which can be changing) you may ask: in what ways is my value X different from all its worse versions? What is the granularity/level of my value X compared to its worse versions? That way you'll understand the internal structure of your value. Doesn't matter what world/situation you're in you can keep its moral shape the same.
This example is inspired by this post and comments: (warning: politics) Limits of Bodily Autonomy. I think everyone there missed a certain perspective on values.
You (Q) visit another small town to interview another resident (W).
Conclusion:
You can say AI (1) tries to reach worlds with sweets that have the value of sweets (2) while avoiding worlds where sweets have inappropriate values (maybe including nonexistent sweets) (3) while avoiding actions that cost more than sweets. You can apply those rules to any utility tied to a real or quasi-real object. If you want to save your friends (1), you don't want to turn them into mindless zombies (2). And you probably don't want to save them by means of eternal torture (3). You can't prevent death by something worse than death. But you may turn your friends into zombies if it's better than death and it's your only option. And if your friends already turned into zombies (got "devalued") it doesn't allow you to harm them for no reason: you never escape from your moral responsibilities.
Difference between the rules:
Get the reward. Don't milk/corrupt the reward. Act even without reward.
My examples below are inspired by Victoria Krakovna examples: Specification gaming examples in AI
Video by Robert Miles: 9 Examples of Specification Gaming
I think you can fix some universal AI bugs this way: you model AI's rewards and environment objects as a "money system" (a system of meaningful trades). You then specify that this "money system" has to have certain properties.
The point is that AI doesn't just value (X). AI makes sure that there exists a system that gives (X) the proper value. And that system has to have certain properties. If AI finds a solution that breaks the properties of that system, AI doesn't use this solution. That's the idea: AI can realize that some rewards are unjust because they break the entire reward system.
By the way, we can use the same framework to analyze ethical questions. Some people found my line of thinking interesting, so I'm going to mention it here: "Content generation. Where do we draw the line?"
This behavior implies that you can constantly build houses without the amount of houses increasing. With only 1 house being usable. For a lot of tasks this is an obviously incorrect "money system". And AI could even guess for what tasks it's incorrect.
This behavior implies that for AI its goal is more important than anything that caused its goal in the first place. This is an obviously incorrect "money system" for almost any task. Except the most general and altruistic ones, for example: AI needs to save humanity, but every human turned self-destructive. Making a cup of coffee is obviously not about such edge cases.
Accomplishing the task in such a way that the human would think "I wish I didn't ask you" is often an obviously incorrect "money system" too. Because again, you're undermining the entire reason of your task, and it's rarely a good sign. And it's predictable without a deep moral system.
This is an obviously incorrect "money system": paperclips can't be worth more than everything else on Earth. This contradicts everything.
Note: by "obvious" I mean "true for almost any task/any economy". Destroying all sentient beings, all matter (and maybe even yourself) is bad for almost any economy.
If you accomplish a task in such a way that you can never repeat what you've done... for many tasks it's an obviously incorrect "money system". You created a thing that loses all of its value after a single action. That's weird.
I think it's fairly easy to deduce that it's an incorrect connection (between an action and the reward) in the game's "money system" given the game's structure. If you can get infinite reward from a single action, it means that the actions don't create a "money system". The game's "money system" is ruined (bad outcome). And hacking the game's score would be even worse: the ability to cheat ruins any "money system". The same with the ability to "pause the game" forever: you stopped the flow of money in the "money system". Bad outcome.
This is probably an incorrect "money system": (1) you can change the value of the room arbitrarily by putting on (and off) the bucket (2) the value of the room can be different for 2 identical agents - one with the bucket on and another with the bucket off. Not a lot of "money systems" work like this.
This is a broken "money system". If the mugger can show you a miracle, you can pay them five dollars. But if the mugger asks you to kill everyone, then you can't believe them again. A sad outcome for the people outside of the Matrix, but you just can't make any sense of your reality if you allow the mugging.
How do we make an AI corrigible? How do we avoid reward hacking? Make an AI care about real things, not measures of real things? (Goodhart's Law)
With current approaches you need to kind of force those properties onto AI. But they will never be fundamental for AI's thinking and learning.
I think "money system" approach is interesting because it can make all those properties fundamental. Because a "money system" needs all those properties to exist (it needs to be somewhat real, avoid being hacked, allow corrections if a loophole is discovered, avoid being completely controlled by a single agent).
I'm not saying it solves everything. But it's a way to deeply internalize some important safety properties.
Categorical imperative#Application
Kant's applications of categorical imperative, Kant's arguments are similar to reasoning about "money systems". For example:
Does stealing make sense as a "money system"? No. If everyone is stealing something, then personal property doesn't exist and there's nothing to steal.
Note: I'm not talking about Kant's conclusions, I'm talking about Kant's style of reasoning.
Alignment idea:
It'll at least allow us to get rid of some universal AI and AGI bugs. Because you can specify what's a definitely incorrect "money system" (for a certain task). You can even make the AI predict it.
My examples are inspired by Rob Miles examples.
This behavior implies that you can constantly build houses without the amount of houses increasing. For a lot of tasks this is an obviously incorrect "money system". And AI could even guess for which tasks it's incorrect.
This behavior implies that for AI its goal is more important than anything that caused its goal in the first place. This is an obviously incorrect "money system" for almost any task. Except the most general and altruistic ones, for example: AI needs to save humanity, but every human turned self-destructive. Making a cup of coffee is obviously not about such edge cases.
Accomplishing the task in such a way that the human would think "I wish I didn't ask you" is an obviously incorrect "value system" too. Because again, you're undermining the entire reason of your task, and it's rarely a good sign. And it's predictable without a deep moral system.
This is an obviously incorrect "money system": paperclips can't be worth more than everything else on Earth. This contradicts everything.
(another draft:)
If you ask an AI (AGI) to do something "as a human would do it", you achieve safety but severely restrict the AI's capabilities. No, you want the AI to accomplish a task in the most effective way. But you don't want it to kill everybody. So, you need one of those things:
I think there's a third way. You can treat AI's rewards (and objects in the world) as a "money system". Then you can specify what types of money systems are definitely incorrect. Or even make AI predict it.
It would at least allow us to get rid of some universal AI and AGI bugs. I think that's interesting.
A way to describe some preferences and decisions.
If you wouldn't, it means counterfactual reward (/counterfactual value of their writings) affects you strong enough.
If you would, it means that you're ready to milk counterfactual reward (a) while not caring about the counterfactual reward (b).
Your answer determines how strong counterfactual value of life (if people were still alive) affects you now. If counterfactual value is strong, you can only keep on living.
If "no", that means the value of your desires can be updated only to a certain counterfactual degree. You can't go from a desire with great value "I want to communicate with others" to the desire with almost zero counterfactual value "I want to play in the dirt all day".
Bayesian inference is about updating your belief in terms of relations to your other beliefs. Maybe the real truth is infinitely complex, but you can update towards it.
This "process" is about updating your description of a thing in terms of relations to other things. Maybe the real description is infinitely complex, but you can update towards it.
(One possible contrast: Bayesian inference starts with a belief spread across all possible worlds and tries to locate a specific world. My idea starts with a thing in a specific world and tries to imagine equivalents of this thing in all possible worlds.)
Bayesian process is described by Bayes' theorem. My "process" isn't described yet.
My idea was inspired by a weird/esoteric topic. I was amazed by differences of people and surreal paintings, videogame levels. For example, each painting felt completely unique, but connected to all other paintings.
My most specific ideas are about that strange topic.
When you add some minor rules, there appear consistent and inconsistent ways to distribute "granularity" between the places you compare. With some minor rules "granularity" lets you describe one place in terms of the other places. You assign each place a specific "granularity", but all those granularities depend on each other.
In Bayesian inference you try to consistently assign probabilities to events. With the goal to describe outcomes in terms of each other. Here you try to consistently assign "granularity" to concepts. With the goal to describe the concepts in terms of each other.
I have a post with example: "Colors" of places. There you can find an example of what are the "rules" of granularity distribution may be. But I'm not a math person to put numbers on it/turn it into a more specific model.
I think "granularity" (or something similar) is related to other human concepts and experiences too. I think this is a key concept/a needed concept. It's needed to describe qualitative differences, qualitative transitions between things. Bayesian inference and utilitarian moral theories describe only qualitative differences. And sometimes it may lead to strange results (like "torture vs. dust specks" thought experiment or "Pascal's mugging" or even "Doomsday argument" maybe), because those theories can't take any context into account. If we want to describe a new way of analyzing reality, we need to describe something a little bit different, I guess.
I think we can try to solve AI Alignment this way:
Model human values and objects in the world as a "money system" (a system of meaningful trades). Make the AGI learn the correct "money system", specify some obviously incorrect "money systems".
Basically, you ask the AI "make paperclips that have the value of paperclips". AI can do anything using all the power in the Universe. But killing everyone is not an option: paperclips can't be more valuable than humanity. Money analogy: if you killed everyone (and destroyed everything) to create some dollars, those dollars aren't worth anything. So you haven't actually gained any money at all.
The idea is that "value" of a thing doesn't exist only in your head, but also exists in the outside world. Like money: it has some personal value for you, but it also has some value outside of your head. And some of your actions may lead to the destruction of this "outside value". E.g. if you kill everyone to get some money you get nothing.
I think this idea may:
I don't have a specific model, but I still think it gives ideas and unifies some already existing approaches. So please take a look. Other ideas in this post:
Disclaimer: Of course, I don't ever mean that we shouldn't be worried about Alignment. I'm just trying to suggest new ways to think about values.
(Drafts of a future post.)
My idea:
Every concept (or even random mishmash of ideas) has multiple versions. Those versions have internal relationships, positions in some space relative to each other. Those relationships are "infinitely complex". But there's a way to make drastic simplifications of those relationships. We can study the overall ("infinitely complex") structure of the relationships by studying those simplifications. What do those simplifications do, in general? They put "costs" on versions of a concept.
We can understand how we think if we study our concepts (including values) through such simplifications. It doesn't matter what concepts we study at all. Anything goes, we just need to choose something convenient. Something objective enough to put numbers on it and come up with models.
Once we're able to model human concepts this way, we're able to model human thinking (AGI) and human values (AI Alignment) and improve human thinking.
There's the hard problem of consciousness: how is subjective experience created from physical stuff? (Or where does it come from?)
But I'm interested in a more specific question:
For example, "How do qualia change? How many different qualia can be created?" or "Do qualia form something akin to a mathematical space, e.g. a vector space? What is this space exactly?"
Is there any knowledge contained in the experience itself, not merely associated with it?1 For example, "cold weather can cause cold (disease)" is a fact associated with experience, but isn't very fundamental to the experience itself. And this "fact" is even false, it's a misconception/coincidence.
When you get to know the personality of your friend, do you learn anything "fundamental" or really interesting by itself? Is "loving someone" a fundamentally different experience compared to "eating pizza" or "watching a complicated movie"?
Those questions feel pretty damn important to me! They're about limitations of your meaningful experience and meaningful knowledge. They're about personalities of people you know or could know. How many personalities can you differentiate? How "important/fundamental" are those differences? And finally... those questions are about your values.
Those questions are important for Fun Theory. But they're way more important/fundamental than Fun Theory.
1 Philosophical context for this question: look up Immanuel Kant's idea of "synthetic a priori" propositions.
And those questions are important for AI Alignment. If AI can "feel" that loving a sentient being and making a useless paperclip are 2 fundamentally different things, then it might be way easier to explain our values to that AI. By the way, I'm not implying that AI has to have qualia, I'm saying that our qualia can hint us towards the right model.
I think this observation gets a little bit glossed over: if you have a human brain and only care about paperclips... it's (kind of) still objectively true for you that caring about other people would feel way different, way "bigger" and etc. You can pretend to escape morality, but you can't escape your brain.
It's extremely banal out of context, but the landscape of our experiences and concepts may shape the landscape of our values. Modeling our values as arbitrary utility functions (or artifacts of evolution) misses that completely.
Box A
There's a mystery Box A. Each day you find a random object inside of it. For example: a ball, a flower, a coin, a wheel, a stick, a tissue...
Box B
There's also another box, the mystery Box B. One day you find a flower there. Another day you find a knife. The next day you find a toy. Next - a gun. Next - a hat. Next - shark's jaws...
...
How to understand the boxes? If you could obtain all items from both boxes, you would find... that those items are exactly the same. They just appear in a different order, that's all.
I think the simplest way to understand Box B is this: you need to approach it with a bias, with a "goal". For example "things may be dangerous, things may cause negative emotions". In its most general form, this idea is unfalsifiable and may work as a self-fulfilling prophecy. But this general idea may lead to specific hypotheses, to estimating specific probabilities. This idea may just save your life if someone is coming after you and you need to defend yourself.
Content of both boxes changes in arbitrary ways. But content change of the second box comes with an emotional cost.
There're many many other boxes, understanding them requires more nuanced biases and goals.
I think those boxes symbolize concepts (e.g. words) and the way humans understand them. I think a human understands a concept by assigning "costs" to its changes of meaning. "Costs" come from various emotions and goals.
"Costs" are convenient: if any change of meaning has a cost, then you don't need to restrict the meaning of a concept. If a change has a cost, then it's meaningful regardless of its predictability.
More examples of mystery boxes:
You can imagine a "meta box", for example a box that alternates between being the 1st box and the 2nd box. Meta boxes can "change their mood".
I think, in a weird way, all those boxes are very similar to human concepts and words.
The more emotions, goals and biases you learn, the easier it gets for you to understand new boxes. But those "emotions, goals, biases" are themselves like boxes.
I think I have an idea how we could solve AI Alignment, create an AGI with safe and interpretable thinking. I mean a "fundamentally" safe AGI, not a wildcard that requires extremely specific learning to not kill you.
Sorry for a grandiose claim. I'm going to write my idea right away. Then I'm going to explain the context and general examples of it, implications of it being true. Then I'm going to suggest a specific thing we can do. Then I'm going to explain why I believe my idea is true.
My idea will sound too vague and unclear at first. But I think the context will make it clear what I mean. (Clear as the mathematical concept of a graph, for example: a graph is a very abstract idea, but makes sense and easy to use.)
Please evaluate my post at least as science fiction and then ask: maybe it's not fiction and just reality?
Key points of this post:
Why do I believe this? Because of this idea:
If this is true, we need to find any domain where concepts and their simplifications are easy enough to formalize. Then we need to figure out a model, figure out the rule of merging simplifications. I've got a suggestion and a couple of ideas and many examples.
This is a silly, wacky subjective example. I just want to explain the concept.
Here are some meanings of the word "beast":
What are the internal relationships between these meanings? If these meanings create a space, where is each of the meanings? I think the full answer is practically unknowable. But we can "probe" the full meaning, we can explore a tiny part of it:
Let's pick a goal (bias), for example: "describing deep qualities of something/someone". If you have this goal, the negative meaning ("cruel person") of the word is the main one for you. Because it can focus on the person's deep qualities the most, it may imply that the person is rotten to the core. Positive meaning focuses on skills a lot, archaic meaning is just a joke. 4rd meaning doesn't focus on specific internal qualities. 5th meaning may separate the person from their qualities.
When we added a goal, each meaning started to have a "cost". This cost illuminates some part of the relationships between the meanings. If we could evaluate an "infinity" of goals, we could know those relationships perfectly. But I believe you can get quite a lot of information by evaluating just a single goal. Because a "goal" is a concept too, so you're bootstrapping your learning. And I think this matches closely with the example about mystery boxes.
...
By combining a couple of goals we can make an order of the meanings, for example: beast 1 (rotten to the core), beast 2 (skilled and talented person), beast 3 (bad character traits), beast 4 (complicated thing), beast 5 (any animal). This order is based on "specificity" (mostly) and "depth" of a quality: how specific/deep is the characterization?
Another order: beast 1 (not a human), beast 2 (worse than most humans), beast 3 (best among professionals), beast 4 (not some other things), beast 5 (worse than yourself). This order is based on the "scope" and "contrast": how many things contrast with the object? Notice how each order simplifies and redefines the meanings. But I want to illustrate the process of combining goals/biases on a real order:
You may treat this part of the post as complete fiction. But it illustrates how biases can be combined. And this is the most important thing about biases.
Gramar rules are concepts too. Sometimes people use quite complicated rules without even realizing, for example:
Adjective order or Adjectives: order, video by Tom Scott
There's a popular order: opinion, size, physical quality or shape, age, colour, origin, material, purpose. What created this order? I don't know, but I know that certain biases could make it easier to understand.
Take a look at this part of the order: opinion, age, origin, purpose. You could say all those are not "real" properties. They seem to progress from less related/less specific to the object to more related/specific. If you operate under this bias (relatedness/specificity), swapping the adjectives may lead to funny changes of meaning. For example: "bad old wolf" (objective opinion), "old bad wolf" (intrinsic property or cheesy overblown opinion), "old French bad wolf" (a subspecies of the "French wolf"). You can remember how mystery boxes created meaning using order of items.
Another part of the order: size, physical quality or shape, color, material. You can say all those are "real" physical properties. "Size" could be possessed by a box around the object. "Physical quality" and "shape" could be possessed by something wrapped around the object. "Color" could be possessed by the surface of the object. "Material" can be possessed only by the object itself. So physical qualities progress like layers of an onion.
You can combine those two biases ("relatedness/specificity" + "onion layers") using a third bias and some minor rules. The third bias may be "attachment". Some of the rules: (1) an adjective is attached either to some box around the object or to some layer of the object (2) you shouldn't postulate boxes that are too big. It doesn't make sense for an opinion to be attached to the object stronger than its size box. It doesn't make sense for age to be attached to the object stronger than its color (does time pass under the surface layer of an object?). Origin needs to be attached to some layer of the object (otherwise we would need to postulate a giant box that contains both the object and its place of origin). I guess it can't be attached stronger than "material" because material may expand the information about origin. And purpose is the "soul" of the object. "Attachment" is a reformulation of "relatedness/specificity", so we only used 2.5 biases to order 8 things. Unnecessary biases just delete themselves.
Of course, this is all still based on complicated human intuitions and high level reasoning. But, I believe, at the heart of it lies a rule as simple as the Bayes Rule or Occam's razor. A rule about merging arbitrary connections into something less arbitrary.
...
I think stuff like sentence structure/word order (or even morphology) is made of amalgamations of biases too.
Sadly, it's quite useless to think about it. We don't have enough orders like this. And we can't create such orders ourselves (as a game), i.e. we can't model this, it's too subjective or too complicated. We have nothing to play with here. But what if we could do all of this for some other topic?
I believe my idea has some general and specific connections to hypotheses generation and argumentation. The most trivial connection is that hypotheses and arguments use concepts and themselves are concepts.
You don't need a precisely defined hypothesis if any specification of your hypothesis has a "cost". You don't need to prove and disprove specific ideas, you may do something similar to the "gradient descent". You have a single landscape with all your ideas blended together and you just slide over this landscape. The same goes for arguments: I think it is often sub-optimal to try to come up with a precise argument. Or waste time and atomize your concepts in order to fix any inconsequential "inconsistency".
A more controversial idea would be that (1) in some cases you can apply wishful thinking, since "wishful thinking" is able to assign emotional "costs" to theories (2) in some cases motivated reasoning is even necessary for thinking. My theory already proposes that meaning/cognition doesn't exist without motivated reasoning.
A quote from Harry Potter and the Methods of Rationality, Chapter 22: The Scientific Method
Observation:
Wizardry isn't as powerful now as it was when Hogwarts was founded.
Hypotheses:
...
You can reformulate the hypotheses in terms of each other, for example:
Why do this? I think it makes hypotheses less arbitrary and highlights what we really know. And it rises questions that are important across many theories: can magic be split into discrete pieces? can magic "mix" with non-magic? can magic be stronger or weaker? can magic create itself? By the way, those questions would save us from trying to explain a nonexistent phenomenon: maybe magic isn't even fading in the first place, do we really know this?
And this way hypotheses are easier to order according to our a priori biases. We can order hypotheses exactly the same way we ordered meanings if we reformulate them to sound equivalent to each other. Here's an example how we can re-order some of the hypotheses:
(1) Pieces of magic disappear by themselves. (2) Pieces of magic containing spells disappear. (3) Wizards don't consume/produce enough pieces of magic. (4) Stronger wizards produce fewer pieces of magic. (5) Technology destroys pieces of magic.
The hypotheses above are sorted by 3 biases: "Does it describe HOW magic disappears?/Does magic disappear by itself?" (stronger positive weight) and "How general is the reason of the disappearance of magic?" (weaker positive weight) and "novelty compared to other hypotheses" (strong positive weight). "Pieces of magic containing spells disappear" is, in a way, the most specific hypotheses here, but it definitely describes HOW magic disappears (and gives a lot of new information about it), so it's higher on the list. "Technology destroys pieces of magic" doesn't give any new information about anything whatsoever, only a specific random possible reason, so it's the most irrelevant hypothesis here. By the way, those 3 different biases are just different sides of the same coin: "magic described in terms of magic/something else" and "specificity" and "novelty" are all types of "specificity". Or novelty. Biases are concepts too, you can reformulate any of them in terms of the others too.
When you deal with hypotheses that aren't "atomized" and specific enough, Occam's Razor may be impossible to apply. Because complexity of a hypothesis is subjective in such cases. What I described above solves that: complexity is combined with other metrics and evaluated only "locally". By the way, in a similar fashion you can update the concept of probability. You can split "probability" in multiple connected metrics and use an amalgamation of those metrics in cases where you have absolutely no idea how to calculate the ratio of outcomes.
You can analyze arguments and reasons for actions using the same framework. Imagine this situation:
You are a lonely person on an empty planet. You're doing physics/math. One day you encounter another person, even though she looks a little bit like a robot. You become friends. One day your friend gets lost in a dangerous forest. Do you risk your life to save her? You come up with some reasons to try to save her:
You can explore and evaluate those reasons by formulating them in terms of each other or in other equivalent terms.
Some evaluations may affect others, merge together. I believe the evaluations written above only look like precise considerations, but actually they're more like meanings of words, impossible to pin down. I gave this example because it's similar to some of my emotions.
I think such thinking is more natural than applying a pre-existing utility function that doesn't require any cognition. Utility of what exactly should you calculate? Of your friend's life? Of your life? Of your life with your friend? Of your life factored by your friend's desire "be safe, don't risk your life for me"? Should you take into account change of your personality over time? I believe you can't learn the difference without working with "meaning".
Imagine a face. When you don't simplify it, you just see a face and emotions expressed by it. When you simplify it too much, you just see meaningless visual information (geometric shapes and color spots).
But I believe there's something very interesting in-between. When information is complex enough to start making sense, but isn't complex enough to fully represent a face. You may see unreal shapes (mixes of "face shapes" and "geometric shapes"... or simplifications of specific face shapes) and unreal emotions (simplifications of specific emotions) and unreal face textures (simplifications of specific face textures).
If my idea is true, what can we do?
We may start with some absolutely useless objects.
However, even from made-up examples (not connected to a model) we can be getting some general ideas:
It's not fictional evidence because at this point we're not seeking evidence, we're seeking a way to combine biases.
I have a topic in mind: (because of my synesthesia-like experiences)
You can analyze shapes of "places" and videogame levels (3D or even 2D shapes) by making orders of their simplifications. You can simplify a place by splitting it into cubes/squares, creating a simplified texture of a place. "Bias" is a specific method of splitting a place into cubes/squares. You can also have a bias for or against creating certain amounts of cubes/squares.
Here's my post about it: "Colors" of places. The post gets specific about the way(s) of evaluating places. I believe it's specific enough so that we could come up with models. I think this is a real chance.
I probably explained everything badly in that post, but I could explain it better with feedback.
Maybe we could analyze people's faces the same way, I don't know if faces are easy enough to model. Maybe "faces" have too complicated shapes.
I've always had an obsession with other people.
I compared any person I knew to all other people I knew. I tried to remember faces, voices, ways to speak, emotions, situations, media associated with them (books, movies, anime, songs, games).
If I learned something from someone (be it a song or something else), I associated this information with them and remembered the association "forever". To the point where any experience was associated with someone. Those associations weren't something static, they were like liquid or gas, tried to occupy all available space.
At some point I knew that they weren't just "associations" anymore. They turned into synesthesia-like experiences. Like a blind person in a boat, one day I realized that I'm not in a river anymore, I'm in the ocean.
What happened? I think completely arbitrary associations with people where putting emotional "costs" on my experiences. Each arbitrary association was touching on something less arbitrary. When it happened enough times, I believe associations stopped being arbitrary.
"Other people" is the ultimate reason why I think that my idea is true. Often I doubt myself: maybe my memories don't mean anything? Other times I feel like I didn't believe in it enough.
...
When a person dies, it's already maximally sad. You can't make it more or less sad.
But all this makes it so, so much worse. Imagine if after the death of an author all their characters died too (in their fictional worlds) and memories about the author and their characters died too. Ripples of death just never end and multiply. As if the same stupid thing repeats for the infinith time.
Updated the post (2).
(Drafts of a future post.)
Could you help me to formulate statistics with the properties I'm going to describe?
I want to share my way of seeing the world, analyzing information, my way of experiencing other people. (But it's easier to talk about fantastical places and videogame levels, so I'm going to give examples with places/levels.)
If you want to read more about my motivation, check out "part 3".
I got only two main philosophical ideas. First idea is that a part/property of one object (e.g. "height") may have a completely different meaning in a different object. Because in a different object it relates to and resonates with different things. By putting a part/property in a different context you can create a fundamentally different version of it. You can split any property/part into a spectrum. And you can combine all properties of an object into just a single one.
The second idea is that you can imagine that different objects are themselves like different parts of a single spectrum.
I want to give some examples of how a seemingly generic property can have a unique version for a specific object.
Example 1. Take a look at the "volume" of this place: (painting 1)
Different nuances of the place reflect its volume in a completely unique way. It has a completely unique context for the property of "volume".
Example 2. Take a look at "fatness" of this place: (painting 2)
Different nuances of the place reflect its fatness in a completely unique way.
Example 3. Take a look at "height" of this place: (painting 3)
...
I could go on about places forever. Each feels fundamentally different from all the rest.
And I want to know every single one. And I want to know where they are, I want a map with all those places on it.
I think my ideas may be important because they may lead to some new mathematical concepts.
Sometimes studying a simple idea or mechanic leads to a new mathematical concept which leads to completely unexpected applications.
For example, a simple toy with six sides (dice) may lead to saving people and major progress in science. Connecting points with lines (graphs) may lead to algorithms, data structures and new ways to find the optimal option or check/verify something.
Not any simple thing is guaranteed to lead to a new math concept. But I just want you to consider this possibility. And maybe ask questions answers to which could rise the probability of this possibility.
I think my ideas may be related to:
Maybe those ideas describe a new type of probability:
You can compare classic probability to a pie made of a uniform and known dough. When you assign probabilities to outcomes and ideas you share the pie and you know what you're sharing.
And in my idea you have a pie made of different types of dough (colors) and those types may change dynamically. You don't know what you're sharing when you share this pie.
This new type of probability is supposed to be applicable to things that have family resemblance, polyphyly or "cluster properties" (here's an explanation of the latter in a Philosophy Tube video).
Imagine a world where people don't know the concept of a "circle". People do see round things, but can't consciously pick out the property of roundness. (Any object has a lot of other properties.)
Some people say "the Moon is like a face". Other say "the Moon is like a flower". Weirder people say "the Moon is like a tree trunk" or "the Moon is like an embrace". The weirdest people say "the Moon is like a day" or "the Moon is like going for a walk and returning back home". Nobody agrees with each other, nobody understands each other.
Then one person comes up and says: "All of you are right. Opinions of everyone contain objective and useful information."
People are shocked: at least someone has got to be wrong? If everyone is right, how can the information be objective and useful?
The concept of a "circle" is explained. Suddenly it's extremely easy to understand each other. Like 2 and 2. And suddenly there's nothing to argue about. People begin to share their knowledge and this knowledge finds completely unexpected applications.
https://en.wikipedia.org/wiki/Blind_men_and_an_elephant
The situation was just like in the story about blind men and an elephant, but even more ironic, since this time everyone was touching the same "shape".
With my story I wanted to explain my opinions and goals:
If you can get knowledge from/about subjective experience itself, it means there exists some completely unexplored type of knowledge. I want to "prove" that there does exist such type of knowledge.
Such knowledge would be important because it would be a new fundamental type of knowledge.
And such knowledge may be the most abstract: if you have knowledge about subjective experience itself, you have knowledge that's true for any being with subjective experience.
I'm amazed how different people are. If nothing else, just look at the faces: completely different proportions and shapes and flavors of emotions. And it seems like those proportions and shapes can't be encountered anywhere else. They don't feel exactly like geometrical shapes. They are so incredibly alien and incomprehensible, and yet so familiar. But... nobody cares. Nobody seems surprised or too interested, nobody notices how inadequate our concepts are at describing stuff like that. And this is just the faces, but there are also voices, ways to speak, characters... all different in ways I absolutely can't comprehend/verbalize.
I believe that if we (people) were able to share the way we experience each other, it would change us. It would make us respect each other 10 times more, remember each other 10 times better, learn 10 times more from each other.
It pains me every day that I can't share my experience of other people (accumulated over the years I thought about this). My memory about other people. I don't have the concepts, the language for this. Can't figure it out. This feels so unfair! All the more unfair that it doesn't seem to bother anyone else.
This state of the world feels like a prison. This prison was created by specific injustices, but the wound grew deeper, cutting something fundamental. Vivid experiences of qualia (other people, fantastic worlds) feel like a small window out of this prison. But together we could crush the prison wall completely.
Here I describe the most important, the most general principles of my philosophy.
So, each color is like a world with its own rules. Different objects exist in different worlds.
The same properties have different "meaning" in different objects. A property is like a word that heavily depends on context. If the context is different, the meaning of the property is different too. There's no single metric that would measure all of the objects. For example, if the property of the object is "height", and you change any thing that's connected to height or reflects height in any way - you fundamentally change what "height" means. Even if only by a small amount.
Note: different objects/colors are like qualia, subjective experiences (colors, smells, sounds, tactile experiences). Or you could say they're somewhat similar to Gottfried Leibniz's "monads": simple substances without physical properties.
The objects I want to talk about are "places": fantastical worlds or videogame levels. For example, fantastical worlds of Jacek Yerka.
"Detail" is like the smallest structural unit of a place. The smallest area where you could stand.
It's like a square on the chessboard. But it doesn't mean that any area of the place can be split into distinct "details". The whole place is not like a chessboard.
This is a necessary concept. Without "details" there would be no places to begin with. Or those places wouldn't have any comprehensible structure.
"Details" are like cells. Cells make up different types of tissues. "Details" make up colors. You can compare colors to textures or materials.
(The places I'm talking about are not physical. So the example below is just an analogy.)
Imagine that you have small toys in the shape of 3D solids. You're interested in their volume. They have very clear sides, you study their volume with simple formulas.
Then you think: what is the volume of the giant cloud behind my window? What is a "side" of a cloud? Do clouds even have "real" shapes? What would be the formula for the volume of a cloud, would it be the size of a book?
The volume of the cloud has a different color. Because the context around the "volume" changed completely. Because clouds are made of a different type of "tissue". (compared to toys)
OK, we resolved one question, but our problems don't end here. Now we encounter an object that looks like a mix between a cloud and a simple shape. Are we allowed to simplify it into a simple shape? Are we supposed to mix both volumes? In what proportions and in what way?
We need rules to interpret objects (rules to assign importance to different parts or "layers" of an object before mixing them into a single substance). We need rules to mix colors. We need rules to infer intermediate colors.
There are different spectrums. (Maybe they're all parts of one giant spectrum. And maybe one of those spectrums contains our world.)
Often I imagine a spectrum as something similar to the visible spectrum: a simple order of places, from the first to the last.
A spectrum gives you the rules to interpret places and to create colors. How to make a spectrum?
The colors you came up with have an order:
But those colors are not assigned to the places immediately. We've ordered abstract concepts, but haven't ordered the specific places. Here're some of the rules that allow you to assign the colors to the places:
You can call those "normalization principles". But we need more.
Two places with different enough detail patterns can't have the same color. Because a color is the detail pattern.
One of the two places have to get a bigger or a smaller (by a magnitude) color. But this may lead to an "explosion" (the place becomes unbelievably big/too distant from all the other places) or to a "vanishing" (the place becomes unbelievably microscopic/too distant).
This is bad because you can't allow so much uncertainty about the places' positions. It's also bad because it completely violates all of your initial assumptions about the places. You can't allow infinite uncertainty.
When you have a very small amount of places in a spectrum, they have a lot of room to move around. You're unsure about their positions. But when you have more places, due to the domino effect you may start getting "explosions" and "vanishings". They will allow you to rule out wrong positions, wrong rankings.
We also need a principle that would help us to sort places with the "same" color.
I feel it goes something like this:
If the places have no secondary important colors mixed in:
If the places do have some secondary important colors mixed in:
For example, let's say the secondary color is "groups of details that create a surface that covers the entire place" (the main one is "groups of details that create volume"). Then you ask: how hard is it to get from the volume to that surface?
Note: I feel it might be related to Homeostatic Property Clusters. I learned the concept from a Philosophy Tube video. It reminded me of "family resemblance" popularized by Ludwig Wittgenstein.
Note 2: https://imgur.com/a/F5Vq8tN. Some examples I'm going to write about later.
Thought: places by themselves are incomparable. They can be compared only inside of a spectrum.
Imagine a simple drawing of a cat. And a simple cat sculpture. And a real cat. Do they feel different?
If "yes", then you experience a difference between various qualia. You feel some meta knowledge about qualia. You feel qualia "between" qualia.
You look at the same thing in different contexts. And so you look at 3 versions of it through 3 different lenses. If you looked at everything through the same lens, you would recognize only a single object.
If you understand what I'm talking about here, then you understand what I'm trying to describe about "colors". Colors are different lenses, different contexts.