The issue arises specifically in the situation of recursive self-improvement: You can't prove self-consistency in mathematical frameworks of "sufficient complexity" (that is, containing the rules of arithmetic in a provable manner).
What this cashes out to is that, considering AI as a mathematical framework, and the next generation of AI (designed by the first) as a secondary mathematical framework - you can't actually prove that there are no contradictions in an umbrella mathematical framework that comprises both of them, if they are of "sufficient complexity". Which means an AI cannot -prove- that a successor AI has not experienced value drift - that is, that the combined mathematical framework does not contain contradictions - if they are of sufficient complexity.
To illustrate the issue, suppose the existence of a powerful creator AI, designing its successor; the successor, presumably, is more powerful than the creator AI in some fashion, and so there are areas of the combinatorially large space that the successor AI can explore (in a reasonable timeframe), but that the creator AI cannot. If the creator can prove there are no contradictions in the combined mathematical framework - then, supposing its values are embedded in that framework in a provable manner, it can be assured that the successor has not experienced value drift.
Mind, I don't particularly think the above scenario is terribly likely; I have strong doubts about basically everything in there, in particular the idea of provable values. I created the post for the five or six people who might still be interested in ideas I haven't seen kicked around on Less Wrong for over a decade.
Alternatively - we communicate about the things that pose the most danger to us, in a manner intended to minimize that danger.
In a typical Level-4 society, people don't have a lot to fear from lions and they aren't in imminent danger of starvation. The bottom half of Maslow's hierarchy is pretty stable.
It's the social stuff where our needs run the risk of being unfulfilled; it is the social stuff that poses the most danger. So of course most of the communication that takes place is about social stuff, in manners intended to reinforce our own social status. This isn't a simulacra of reality - it is reality, and people suffer real harms for being insufficient to the task.
I think a substantial part of the issue here is the asymmetry created when one party is public, and one party is not.
Suppose a user is posting under their real name, John Doe, and another user is posted under a pseudonym, Azure_Pearls_172. An accusation by Azure against John can have real-world implications; an accusation by John against Azure is limited by the reach of the pseudonym. Azure can change their pseudonym, and leave the accusations behind; John cannot.
Doxxing can make a situation more symmetrical in this case. Whether or not it is merited is a complicated topic, particularly as the norms around doxxing exist for a reason.
Suppose a user assaults other users, and switches pseudonyms whenever identified to keep finding new targets - I doubt anybody would argue that doxxing a predatory member of this sort is a bad thing, in and of itself. Contrariwise, suppose a user gets annoyed with another user, and then doxxes them and accuses them in bad faith of assault. We don't want that.
I think mixed-anonymity is basically a terrible way to run things, owing to the asymmetries involved, and in general communities should have norms that either reflect no anonymity (everybody uses their real names), or total anonymity (nobody uses their real names, and also nobody ever meets anybody else in person). If you're mixing the cases, you're creating the potential for abuses.
If you disagree that anonymous users should never meet in person, well - if you're willing to meet in person, why are you choosing to be anonymous? Is anonymity a conscious and deliberate decision, such that doxxing would actually be a violation (in which case, why are you doxxing yourself?), or is it just a default option? And if you're meeting another anonymous user - well, what is their reason for choosing to be anonymous?
Mind, I've met other pseudonymous users of various communities, so I can't 100% claim to be consistent with this. But I only do so when my choice of anonymity is more "default" than "deliberate choice" - there are some pseudonyms I use which I certainly wouldn't meet somebody under the auspices of, because they are deliberate choices, chosen to minimize exposure.
(Granted, I haven't used them in a while, and at this point most of the opinions I shared under them are basically widely accepted today, and those that aren't are at least socially acceptable - so, eh, it would probably be fine at this point.)
I am, unapologetically, a genius. (A lot of people here are.)
My experience of what it is like being a genius: I look at a problem and I know an answer. That's pretty much it. I'm not any faster at thinking than anybody else; I'd say I'm actually a somewhat slower thinker, but make up for it by having "larger" thoughts; most people seem to have fast multi-core processors, and I'm running a slightly slow graphics card. Depending on what you need done, I'm either many orders of magnitude better at it - or completely hopeless. It mostly depends on whether or not I've figured out how to adapt the problem to my way of thinking.
Also, sometimes my "I know an answer" is wrong - this means that I still have to go through the "manual" effort of thinking, to verify the answer, and I'm using the slow graphics card to run a mostly single-thread process. Sometimes the answer is too hard to verify either way! (Hey, look at my username; I've been pursuing themes on a crackpot physics for twenty five years, and I'm not particularly any closer to being able to determine whether or not it is true or false!)
In practice, in the real world, what this translates to is: I'm often no faster at providing a "good" answer than a merely above-average person, because, while I know -an- answer, it will take me just as long to verify whether or not it is a good answer as it takes a merely above-average person to go through the manual effort of finding an answer and then verifying it!
Also, my answers are often ... strange. Not wrong, and I can find answers to problems other people find intractable, or find a way to do something in way less time than somebody else - but on rare occasion, I can't find the answer that an average person can spot immediately, and much more frequently, I find an answer that takes way -more- time than the obvious-to-other-people solution.
What I conclude is that what makes me a "genius" is context - I am in fact likely merely somewhat above-average, but that I find the difficulty of problems -different- than other people. Imagine, for a moment, that everybody is given a map of the world, which maps, let's say, 5% of the territory. But 99% of the 5% is in common; ask a hundred people, and 99 of them will know where Canada is. My map is only somewhat above average in size, but it covers an entirely different geography - I couldn't tell you where Canada is, but I know where Tucson is, something that is on less than .01% of the maps out there.
You need to get to Canada, you can ask just about anybody, and they can tell you how to get there. So, even though I don't have Canada on my map, this mostly doesn't present me any problems, except when I'm trying to get to Alaska and somebody tells me to just drive through Canada.
But if you need to go to Tucson, it's hard to find somebody who knows where it is. But I can immediately tell you. Nobody ever asks me how to get to Canada - why would they? - but everybody asks me how to get to Tucson, so I look like I know a lot. And IQ tests really reward knowing how to get to Tucson, and don't bother asking about Canada at all, so - I'm a genius. And because everyone knows where Canada is, I benefit, from an intellectual perspective, as much from having ordinary people around me, as they benefit from having a "genius" around them.
But I'm in the same boat as anybody else when I need to get to Jupiter; nobody has a map that says how to get there.
Take a step back and try rereading what I wrote in a charitable light, because it appears you have completely misconstrued what I was saying.
A major part of the "cooperation" involved here is in being able to cooperate with yourself. In an environment with a well-mixed group of bots each employing differing strategies, and some kind of reproductive rule (if you have 100 utility, say, spawn a copy of yourself), Cooperate-bots are unlikely to be terribly prolific; they lose out against many other bots.
In such an environment, a strategem of defecting against bots that defect against cooperate-bot is a -cheap- mechanism of coordination; you can coordinate with other "Selfish Altruist" bots, and cooperate with them, but you don't take a whole lot of hits from failing to edit: defect against cooperate-bot. Additionally, you're unlikely to run up against very many bots that cooperate with cooperate-bot, but defect against you. As a coordination strategy, it is therefore inexpensive.
And if "computation time" is considered as an expense against utility, which I think reasonably should be the case, you're doing a relatively good job minimizing this; you have to perform exactly one prediction of what another bot will do. I did mention this was a factor.
Evolution gave us "empathy for the other person", and evolution is a reasonable proxy for a perfectly selfish utility machine, which is probably good evidence that this might be an optimal solution to the game theory problem. (Note: Not -the- optimal solution, but -an- optimal solution, in an ecosystem of optimal solutions.)
Note that it is possible to deceive others by systematically adjusting predictions upward or downward to reflect how desirable it is that other people believe those predictions, in a way which preserves your score.
This is true even if you bucket your scores; say you're evaluating somebody's predictive scores. You see that when they assign a 60% probability to an event, that event occurs 60% of the time. This doesn't mean that any -specific- prediction they make of 60% probability will occur 60% of the time, however! They can balance out their predictions by adjusting two different predictions, overestimating the odds of one, and underestimating the odds of another, to give the appearance of perfectly calibrated predictions.
The Brier Score is useful for evaluating how good a forecaster is, but this is not the same as evaluating how good any given forecast a forecaster makes is. If the Oracle really hates Odysseus, the Oracle could give a forecast that, if believed, results in a worse outcome for Odysseus, and balance this out by giving a forecast to another individual that results in apparent perfect calibration.
How does one correctly handle multi-agent dilemmas, in which you know the other agents follow the same decision theory? My implementation of "UDT" defects in a prisoner's dilemma against an agent that it knows is following the same decision procedure. More precisely: Alice and Bob follow the same decision procedure, and they both know it. Alice will choose between cooperate/defect, then Bob will choose between cooperate/defect without knowing what Alice picked, then the utility will be delivered. My "UDT" decision procedure reasons as follows for Alice: "if I had pre-commited to cooperate, then Bob would know that, so he would defect, therefore I defect". Is there a known way out of this, besides special casing symmetric dilemmas, which is brittle?
My solution, which assumes computation is expensive, is to reason about other agents based on their behavior towards a simplified-model third agent; the simplest possible version of this is "Defect against bots who defect against cooperate-bot, otherwise cooperate" (and this seems relatively close to how humans operate - we don't like people who defect against the innocent).
The point there is that there is no contradiction because the informational content is different. "Which is the baseline" is up to the person writing the problem to answer. You've asserted that the baseline is A vs B; then you've added information that A is actually A1 and A2.
The issue here is entirely semantic ambiguity.
Observe what happens when we remove the semantic ambiguity:
You've been observing a looping computer program for a while, and have determined that it shows three videos. The first video portrays a coin showing tails. The second video portrays two coins; the left coin shows heads, the right coin shows tails. The third video also portrays two coins; the left coin shows heads, the right coin shows heads.
You haven't been paying attention to the frequency, but now, having determined there are three videos you can see, you want to figure out how frequently each video shows up. What are your prior odds for each video?
33/33/33 seems reasonable. I've specified that you're watching videos; the event is which video you are watching, not the events that unfold within the video.
Now, consider an alternative framing: You are watching somebody as they repeat a series of events. You have determined the events unfold in three distinct ways; all three begin the same way, with a coin being flipped. If the coin shows heads, it is flipped again. If the coin shows tails, it is not. What are your prior odds for each sequence of events?
25/25/50 seems reasonable.
Now, consider yet another framing: You are shown something on a looping computer screen. You have determined the visuals unfold in three distinct ways; all three begin the same way, with a coin being flipped. If the coin shows heads, it is flipped again. If the coin shows tails, it is not. What are your prior odds here?
Both 25/25/50 and 33/33/33 are reasonable. Why? Because it is unclear whether or not you are watching a simulation of coin flips, or something like prerecorded videos; it is unclear whether or not you should treat the events within what you are watching as events, or whether you should treat the visuals themselves you are watching as the event.
Because it is unclear, I'd lean towards treating the visuals you are watching as the event - that is, assume independence. However, it would be perfectly fair to treat the coin tosses as events also. Or you could split the difference. Prior probabilities are just your best guess given the information you have available - and given that I don't have access to all the information you have available, both options are fair.
Now, the semantic ambiguity you have introduced, in the context of this, is like this:
You're told you are going to watch a computer program run, and what you see will begin with a coin being flipped, showing heads or tails. What are your probabilities that it will show heads or tails?
Okay, 50/50. Now, if you see the coin shows heads, you will see that it is flipped again; we now have three possibilities, HT, HH, and TT. What are your probabilities for each event?
Notice: You didn't specify enough to know what the relevant events we're assigning probabilities to even are! We're in the third scenario; we don't know if it's a video, in which case the relevant event is "Which video we are watching", or if it is a simulation, in which case the relevant event is "The outcome of each coin toss." Either answer works, or you can split the difference, because at this point a large part of the probability-space is devoted, not to the events unfolding, but towards the ambiguity in what events we're even evaluating.
Another crackpot physics thing:
My crackpot physics just got about 10% less crackpot. As it transpires, one of the -really weird- things in my physics, which I thought of as a negative dimension, already exists in mathematics - it's a Riemann Sphere. (Thank you, Pato!)
This "really weird" thing is kind of the underlying topology of the universe in my crackpot physics - I analogized the interaction between this topology and mass once to an infinite series of Matryoshka dolls, where every other doll is "inside out and backwards". Don't ask me to explain that; that entire avenue of "attempting to communicate this idea" was a complete and total failure, and it was only after drawing a picture of the topology I had in mind that someone (Pato) observed that I had just drawn a somewhat inaccurate picture of a Riemann Sphere. (I drew it as a disk in which the entire boundary was the same point, 0, with dual infinities coinciding at the origin. I guess, in retrospect, a sphere was a more obvious way of describing that.)
If we consider that the points are not evenly allocated over the surface of the sphere - they're concentrated at the poles (each of which is simultaneously 0 and infinity, the mapping is ambiguous), if we drew a line such that it crosses the same number of points with each revolution, we get - something like a logarithmic spiral. (Well, it's a logarithmic spiral with the "disk" interpretation; it's a spherical spiral whose name I don't know in the spherical interpretation.)
If we consider the bundle of lines connecting the poles, and use this constant-measure-per-revolution spiral to describe their path, I think that's about ... 20% of the way to actually converting the insanity in my head into real mathematics. Each of these lines is "distance" (or, alternatively, "time" - it depends on which of the two charts you employ). The bundle of these spirals provides one dimension of rotation; there's a mathematical way of extracting a second dimension of rotation, to get a three-dimensional space, but I don't understand it at an intuitive level yet.
A particle's perspective is "constantly falling into an infinity"; because of the hyperbolic nature of the space, I think a particle always "thinks" it is at the equator - it never actually gets any closer. Because the lines describe a spiral, the particle is "spinning". Because of the nature of the geometry of the sphere, this spin expresses itself as a spinor, or at least something analogous to one.
Also, apparently, Riemann Spheres are already used in both relativistic vacuum field equations and quantum mechanics. Which, uh, really annoys me, because I'm increasingly certain there is "something" here, and increasingly annoyed that nobody else has apparently just sat down and tried to unify the fields in what, to me, is the most obvious bloody way to unify them; just assume they're all curvature, that the curvature varies like a decaying sine wave (like "sin(ln(x))/x", which exhibits exactly the kind of decay I have in mind). Logarithmic decay of frequency over distance ensures that there is a scalar symmetry, as does a linear decay of amplitude over distance.
Yes, I'm aware of the intuitive geometry involved in an inverse-square law; I swear that the linear decay makes geometric sense too, given the topology in my head. Rotation of a logarithmic spiral gives rise to a linear rescaling relative to the arclength of that rotation. Yes, I'm aware that the inverse-square law also has lots of evidence - but it also has lots of evidence against it, which we've attempted to patch by assuming unobserved mass that precisely accounts for the observed anomalies. I posit that the sinusoidal wave in question has ranges wherein the amplitude is decaying approximately linearly, which creates the apparent inverse-square behavior for certain ranges of distances - and because these regions of space are where matter tends to accumulate, having the most stable configurations, they're disproportionately where all of our observations are made. It's kind of literally the edge cases, where the inverse-square relationship begins to break down (whether it really does, or apparently does), and the configurations become less stable, that we begin to observe deviations.
I'm still working on mapping my "sin(ln(x))/x" equation (this is not the correct equation, I don't think, it's just an equation that kind of looks right for what's in my head, and it gave me hints about where to start looking) to this structure; there are a few options, but none stand out yet as obviously correct. The spherical logarithmic spiral is a likely candidate, but figuring out the definition of the spiral that maintains constant "measure" with each rotation requires some additional understanding on my part.