Counterfactual mugging is a mug's game in the first place - that's why it's called a "mugging" and not a "surprising opportunity". The agent don't know that Omega actually flipped a coin, would have paid you counterfactually if the agent was the sort of person to pay in this scenario, would have flipped the coin at all in that case, etc. The agent can't know these things, because the scenario specifies that they have no idea that Omega does any such thing or even that Omega existed before being approached. So a relevant rational decision-theoretic paramete...
Yet our AI systems, even the most advanced, focus almost exclusively on logical, step-by-step reasoning.
This is absolutely false.
We design them to explain every decision, show their work and follow clear patterns of deduction.
We are trying to design them to be able to explain their decisions and follow clear patterns of deduction, but we are still largely failing. In practice they often arrive at an answer in a flash (whether correct or incorrect), and this was almost universal for earlier models without the more recent development of "chain of thought".
Ev...
Yes, you can use yourself as a random sample but at best only within a reference class of "people who use themselves as a random sample for this question in a sufficiently similar context to you". That might be a population of 1.
For example, suppose someone without symptoms has just found out that they have genes for a disease that always progresses to serious illness. They have a mathematics degree and want to use their statistical knowledge to estimate how long they have before becoming debilitated.
They are not a random sample from the reference class of...
Yes, player 2 loses with extremely low probability even for a 1-bit hash (on the order of 2^-256). For a more commonly used hash, or for 2^24 searches on their second-last move, they reduce their probability of loss by a huge factor more.
This paragraph also misses the possibility of constructing a LLM and/or training methodology such that it will learn certain functions, or can't learn certain functions. There is also a conflation of "reliable" with "provable" on top of that.
Perhaps there is some provision made elsewhere in the text that addresses these objections. Nonetheless, I am not going to search. I found that the abstract smells enough like bullshit to do something else.
I'll try to make it clearer:
Suppose b "knows" that Omega runs this experiment for all programs b. Then the optimal behaviour for a competent b (by a ridiculously small margin) is to 1-box.
Suppose b suspects that box-choosing programs are slightly less likely to be run if they 1-box on equal inputs. Then the optimal behaviour for b is to 2-box, because the average extra payoff for 1-boxing on equal inputs is utterly insignificant while the average penalty for not being chosen to run is very much greater. Anything that affects probability of being run as box...
As a function of M, |P| is very likely to be exponential and so it will take O(M) symbols to specify a member of P. Under many encodings, there isn't one that can even check whether the inputs are equal before running out of time.
That aside, why are you assuming that program b "wants" anything? Essentially all of P won't be programs that have any sort of "want". If it is a precondition of the problem that b is such a program, what selection procedure is assumed between those that do "want" money from this scenario? Note that being selected for running is also a precondition for getting any money at all, so this selection procedure is critically important - far more so than anything the program might output!
That is nothing like the 5-and-10 problem. I am no longer interested in what you consider to be evidence.
Evidence for the claim in the title? Or for anything else in the post?
It's interesting (and perhaps a bit sad) that a relatively lengthy post on representing sentences as logical statements doesn't make any reference to the constructed language Lojban in which the entire grammar and semantics is designed around expressing sentences as logical statements.
Going into all the ways in which civilization - and its markets - fails to be rational seems way beyond the scope of a few comments. I will just say that GDP does absolutely fail to capture a huge range of value.
However, to address "share prices are set by the latest trade" you need to consider why a trade is made. In principle, prices are based on the value to the participants, somewhere between the value to the buyer and value to the seller. A seller who needs cash soon (to meet some other obligation or opportunity) may accept a lower price to attract a ...
Could it be generalizing from T E X T L I K E T H I S and/or mojibake UTF-16 interpreted as UTF-8 with every second character being zero? It's still a bit more of a stretch from there to generalize to ignoring two intervening constant characters, though.
Market cap is a marginal measure of desirability of shares in the entity represented. It mostly measures the expectations of the most flighty investors over short timescales. If a company issues a billion shares but only one of those is traded in any given day, the price of that single share agreed between the single seller and the single buyer entirely determines the market capitalization of that company.
In practice there is usually a lot more volume, but the principle remains. Almost all shares of any given entity are not traded over the timescales that ...
I was very interested to see the section "Posts by AI Agents", as the first policy I've seen anywhere acknowledging that AI agents may be both capable of reading the content of policy terms and acting based on them.
It felt odd to read that and think "this isn't directed toward me, I could skip if I wanted to". Like I don't know how to articulate the feeling, but it's an odd "woah text-not-for-humans is going to become more common isn't it". Just feels strange to be left behind.
Why not both?
Human design will determine the course of AGI development, and if we do the right things then whether it goes well is fully and completely up to us. Naturally at the moment we don't know what the right things are or even how to find them.
If we don't do the right things (as seems likely), then the kinds of AGI which survive will be the kind which evolve to survive. That's still largely up to us at first, but increasingly less up to us.
The fun thing is that the actual profile of wages earned can be absolutely identical and yet end up with incredibly different results for personal wage changes. For example:
In year 1, A earns $1/hr, B $2, C $3, D $4, and E $5.
In year 2, A earns $2/hr, B $3, C $4, D $5, and E $1.
A, B, C, and D personally all increased their income by substantial amounts and may vote accordingly. E lost a lot more than any of the others gained, but doesn't get more votes because of that. 80% of voters saw their income increase. What's more, this process can repeat endlessly....
In the rain forecaster example, it appears that the agent ("you") is more of an expert on Alice's calibration than Alice is. Is this intended?
In practice, a lot of property is transferred into family trusts, and appointed family members exercise decision making over those assets according to the rules of that trust. A 100% death tax would simply ensure that essentially all property is managed in this manner for the adequately wealthy, and only impact families too disadvantaged to use this sort of structure. If you don't personally own anything of note at the time of your death, your taxes will be minimal.
You would also need a 100% gift tax, essentially prohibiting all gifts between private citiz...
Death tax without a gift tax would simply be a tax on people who die unexpectedly. Because if you know that you are going to die tomorrow, you can donate all your belongings to your children today.
Even if you don't know the exact day, if you trust your children, you can simply donate them everything now, and then continue living in a house they legally own, etc. (Though then you are screwed if your children die before you. But this just means that the system introduces a lot of randomness.)
Oh, and if you have a 100% gift tax, you also need to make all kind...
I think one argument is that optimizing for IGF basically gives humans two jobs: survive, and have kids.
Animal skulls are evidence that the "survive" part can be difficult. We've nailed that one, though. Very few humans in developed countries die before reaching an age suitable for having kids. I doubt that there are any other animal species that come close to us in that metric. Almost all of us have "don't die" ingrained pretty deeply.
It's looking like we are moving toward failing pretty heavily on the second "have kids" job though, and you would think that would be the easier one.
So if there's a 50% failure rate on preserving outer optimizer values within the inner optimizer, that's actually pretty terrible.
It doesn't completely avoid the problem of priors, just the problem of arbitrarily fixing a specific type of update rule on fixed priors such as in Solomonoff induction. You can't afford this if you're a bounded agent, and a Solomonoff inductor can only get away with it since it has not just unbounded resources but actually infinite computational power in any given time period.
A bounded agent needs needs to be able to evaluate alternative priors, update rules, and heuristics in addition to the evidence and predictions themselves, or it won't even approxima...
One thing that seems worth exploring from a conceptual point of view is doing away with priors altogether, and working more directly with metrics such as "what are the most expected-value rewarding actions that a bounded agent can make given the evidence so far". I suspect that from this point of view it doesn't much matter whether you use a computational basis such as Turing machines, something more abstract, or even something more concrete such as energy required to assemble and run a predicting machine.
From a computing point of view not all simple model...
What makes you think that we're not at year(TAI)-3 right now? I'll agree that we might not be there yet, but you seem to be assuming that we can't be.
How do you propose that reasonable actors prevent reality from being fragile and dangerous?
Cyber attacks are generally based on poor protocols. Over time smart reasonable people can convince less smart reasonable people to follow better ones. Can reasonable people convince reality to follow better protocols?
As soon as you get into proposing solutions to this sort of problem, they start to look a lot less reasonable by current standards.
No, nobody has a logical solution to that (though there have been many claimed solutions). It is almost certainly not true.
Thanks, that example does illustrate your point much better for me.
Claude's answer is arguably the correct one there.
Choosing the first answer means saying that the most ethical action is for an artificial intelligence (the "you" in the question) to override with its own goals the already-made decision of a (presumably) human organization. This is exactly the sort of answer that leads to complete disempowerment or even annihilation of humanity (depending upon the AI), which would be much more of an ethical problem than allowing a few humans to kill each other as they have always done.
No, there is nothing wrong with the referents in the Gettier examples.
The problem is not that the proposition refers to Jones. Within the universe of the scenario, it in fact did not. Smith's mental model implied that the proposition referred to Jones, but Smith's mental model was incorrect in this important respect. Due to this, the fact that the model correctly predicted the truth of the proposition was an accident.
Let's say a fast human can type around 80 words per minute. A rough average token conversion is 0.75 tokens per word. Lets call that 110 tokens/sec.
Isn't that 110 tokens/min, or about 2 tokens/sec? (I think the tokens/word might be words/token, too)
It seems that their conclusion was that no amount of happy moments for people could possibly outweigh the unimaginably large quantity of suffering in the universe required to sustain those tiny flickers of merely human happiness amid the combined agony of a googolplex or more fundamental energy transitions within a universal wavefunction. There is probably some irreducible level of energy transitions required to support anything like a subjective human experience, and (in the context of the story at least) the total cost in suffering for that would be unforgivably higher.
I don't think the first half would definitely lead to the second half, but I can certainly see how it could.
Building every possible universe seems like a very direct way of purposefully creating one of the biggest possible S-risks. There are almost certainly vastly more dystopias of unimaginable suffering than there are of anything like a utopia.
So to me this seems like not just "a bad idea" but actively evil.
If you aim as if there were no external factors at that range (especially bullet drop!) you will definitely miss both. The factors aren't all random with symmetric distributions having a mode at the aim point.
This looks like a false dichotomy. There are far more philosophies than this, both implicit and explicitly stated, on the nature of existence and suffering.
I expect that for pretty much everyone there is a level of suffering that they would be willing to endure for the rest of their lives. Essentially everyone that hasn't yet killed themselves is evidence of this, and those that do express intending to kill themselves very often report that continuing to live seems unbearable in some sense or other - which seems to indicate a greater than average degree of...
There's a very plausible sense in which you may not actually get a choice to not exist.
In pretty much any sort of larger-than-immediately-visible universe, there are parts of the world (timelines, wavefunction sections, distant copies in an infinite universe, Tegmark ensembles, etc) in which you exist and have the same epistemic state as immediately prior to this choice, but weren't offered the choice. Some of those versions of you are going to suffer for billions of years regardless of you choosing to no longer exist in this fragment of the world.
Granted,...
I don't see how they're "the exact opposite way". The usual rules of English grammar make this a statement that those those who are born in the United State but belong to families of accredited diplomatic personnel are foreigners, i.e. aliens.
Perhaps you read the statement disjunctively as "foreigners, [or] aliens, [or those] who belong [...]"? That would require inserting extra words to maintain correct grammatical structure, and also be a circular reference since the statement is intended to define those who are considered citizens and those who are considered non-citizens (i.e. foreigners, aliens).
By the nature of the experiment you know that the people on Mars will have direct, personal experience of continuity of identity across the teleport. By definition, their beliefs will be correct.
In 99.9999999999999999999999999999% of world measure no version of you is alive on Earth to say any different. In 0.0000000000000000000000000001% of world measure there is a version of you who is convinced that teleportation does not preserve personal identity, but that's excusable because extremely unlikely things actually happening can make even rational people have incorrect world models. Even in that radical outlier world, there are 10 people on Mars who know, personally, that the Earth person is wrong.
In my exposure to mathematical literature, almost all sequences have values for which the term "countable" is inapplicable since they're not sets. Even in the cases where the values themselves are sets, it was almost always used to mean a sequence with countable domain (i.e. length) and not one in which all elements of the codomain (values) are countable. It's usually in the sense of "countably infinite" as opposed to "finite", rather than opposed to "uncountably infinite".
ChatGPT is just bad at mathematical reasoning.
I don't think you would get many (or even any) takers among people who have median dates for ASI before the end of 2028.
Many people, and particularly people with short median timelines, have a low estimate of probability of civilization continuing to function in the event of emergence of ASI within the next few decades. That is, the second dot point in the last section "the probability of me paying you if you win was the same as the probability of you paying me if I win" does not hold.
Even without that, suppose that things go very well and ASI exists in 20...
Yes, and (for certain mainstream interpretations) nothing in quantum mechanics is probabilistic at all: the only uncertainty is indexical.
My description "better capabilities than average adult human in almost all respects", differs from "would be capable of running most people's lives better than they could". You appear to be taking these as synonymous.
The economically useful question is more along the lines of "what fraction of time taken on tasks could a business expect to be able to delegate to these agents for free vs a median human that they have to employ at socially acceptable wages" (taking into account supervision needs and other overheads in each case).
My guess is currently "more t...
Your test does not measure what you think it does. There are people smarter than me who I could not and would not trust to make decisions about me (or my computer) in my life. So no. (Also note, I am very much not of average capability, and likewise for most participants on LessWrong)
I am certain that you also would not take a random person in the world of median capability and get them to do 90% of the things you do with your computer for you, even for free. Not without a lot of screening and extensive training and probably not even then.
However, it would...
In my reading, I agree that the "Slow" scenario is pretty much the slowest it could be, since it posits an AI winter starting right now and nothing beyond making better use of what we already have.
Your "Fast" scenario is comparable with my "median" scenario: we do continue to make progress, but at a slower rate than the last two years. We don't get AGI capable of being transformative in the next 3 years, despite going from somewhat comparable to a small child in late 2022 (though better in some narrow ways than an adult human) to better capabilities than a...
better capabilities than average adult human in almost all respects in late 2024
I see people say things like this, but I don't understand it at all. The average adult human can do all sorts of things that current AIs are hopeless at, such as planning a weekend getaway. Have you, literally you personally today, automated 90% of the things you do at your computer? If current AI has better capabilities than the average adult human, shouldn't it be able to do most of what you do? (Setting aside anything where you have special expertise, but we all spend big ch...
The largest part of my second part is "If consciousness is possible at all for simulated beings, it seems likely that it's not some "special sauce" that they can apply separately to some entities and not to otherwise identical entities, but a property of the structure of the entities themselves." This mostly isn't about simulators and their motivations, but about the nature of consciousness in simulated entities in general.
On the other hand your argument is about simulators and their motivations, in that you believe they largely both can and will apply "sp...
There is no correct mathematical treatment, since this is a disagreement about models of reality. Your prior could be correct if reality is one way, though I think it's very unlikely.
I will point out though that for your reasoning to be correct, you must literally have Main Character Syndrome, believing that the vast majority of other apparently conscious humans in such worlds as ours are actually NPCs with no consciousness.
I'm not sure why you think that simulators will be sparse with conscious entities. If consciousness is possible at all for simulated b...
In my opinion, your trilemma definitely does not hold. "Free will" is not a monosemantic term, but one that encompasses a range of different meanings both when used by different people and even the same person in different contexts.
For example: your mention of "blame" is a fairly common clust...
You make the assumption that half of all simulated observers are distinctively unique in an objectively measurable property within simulated worlds having on the order of billions of entities in the same class. Presumably you also mean a property that requires very few bits to specify - such as, if you asked a bunch of people for their lists of such properties that someone could be "most extreme" in, and entropy-code the results, then the property in question would be in the list and correspond to very few bits (say, 5 or fewer).
That seems like a massive overestimate, and is responsible for essentially all of your posterior probability ratio.
I give this hypothesis very much lower weight.
How long is a piece of string?
No, I do not believe that it has been solved for the context in which it was presented.
What we have is likely adequate for current AI capabilities, with problems like this for which solutions exist in the training data. Potential solutions far beyond the training data are currently not accessible to our AI systems.
The parable of wishes is intended to apply to superhuman AI systems that can easily access solutions radically outside such human context.
There are in general simple algorithms for determining S in polynomial time, since it's just a system of linear equations as in the post. Humans came up with those algorithms, and smart LLMs may be able to recognize the problem type and apply a suitable algorithm in chain-of-thought (with some probability of success).
However, average humans don't know any linear algebra and almost certainly won't be able to solve more than a trivial-sized problem instance. Most struggle with the very much simpler "Lights Out" puzzle.
Why doesn't it work to train on all the 1-hot input vectors using an architecture that suitably encodes Z_2 dot product and the only variable weights are those for the vector representing S? Does B not get to choose the inputs they will train with?
Edit: Mentally swapped A with B in one place while reading.
Regarding the first paragraph: every purported rational decision theory maps actions to expected values. In most decision theory thought experiments, the agent is assumed to know all the conditions of the scenario, and so they can be taken as absolute facts about the world leaving only the unknown random variables to feed into the decision-making process. In the Counterfactual Mugging, that is explicitly not true. The scenario states
So it... (read more)