I've been in two high-stakes bad-vibe situations. (In one of them, someone else initially got the bad vibes, but I know enough details to comment on it.) In both cases, asking around would have revealed the issue. However, in both cases the people who knew the problematic person well, had either a good impression of them, or a very bad impression of them. Because there's a pattern where someone who's problematic in some way is also charismatic, or good at making up for it in other ways, etc.
Or in the words of Sean Carroll's Poetic Naturalism:
A "way of talking" is a map, and "the world" is the territory.
The orthogonality thesis doesn't say anything about intelligences that have no goals. It says that an intelligence can have any specific goal. So I'm not sure you've actually argued against the orthogonality thesis.
And English has it backwards. You can see the past, but not the future. The thing which just happened is most clear. The future comes at us from behind.
Here's the reasoning I intuitively want to apply:
where X = "you roll two 6s in a row by roll N", Y = "you roll at least two 6s by roll N", and Z = "the first N rolls are all even".
This is valid, right? And not particularly relevant to the stated problem, due to the "by roll N" qualifiers mucking up the statements in complicated ways?
Sure. For simplicity, say you play two rounds of Russian Roulette, each with a 60% chance of death, and you stop playing if you die. What's the expected value of YouAreDead at the end?
So the expected value of the boolean YouAreDead random variable is 0.84.
Now say you're monogamous and go on two dates, each with a 60% chance to go well, and if they both go well then you pick one person
Probability of at least two success: ~26%
My point is that in some situations, "two successes" doesn't make sense. I picked the dating example because it's cute, but for something more clear cut imagine you're playing Russian Roulette with 10 rounds each with a 10% chance of death. There's no such thing as "two successes"; you stop playing once you're dead. The "are you dead yet" random variable is a boolean, not an integer.
If you're monagamous and go to multiple speed dating events and find two potential partners, you end up with one partner. If you're polyamorous and do the same, you end up with two partners.
One way to think of it is whether you will stop trying after the first success. Though that isn't always the distinguishing feature. For example, you might start 10 job interviews at the same time, even though you'll take at most one job.
However it is true that doing something with a 10% success rate 10 times will net you an average of 1 success.
For the easier to work out case of doing something with a 50% success rate 2 times:
Gives an average of 1 success.
Of course this only matters for the sort of thing where 2 successes is better than 1 success:
IQ over median does not correlate with creativity over median
That's not what that paper says. It says that IQ over 110 or so (quite above median) correlates less strongly (but still positively) with creativity. In Chinese children, age 11-13.
And for a visceral description of a kind of bullying that's plainly bad, read the beginning of Worm: https://parahumans.wordpress.com/2011/06/11/1-1/
I double-downvoted this post (my first ever double-downvote) because it crosses a red line by advocating for verbal and physical abuse of a specific group of people.
Alexej: this post gives me the impression that you started with a lot of hate and went looking for justifications for it. But if you have some real desire for truth seeking, here are some counterarguments:
Yeah, I think “computational irreducibility” is an intuitive term pointing to something which is true, important, and not-obvious-to-the-general-public. I would consider using that term even if it had been invented by Hitler and then plagiarized by Stalin :-P
No, Rice's theorem is really not applicable. I have a PhD in programming languages, and feel confident saying so.
Let's be specific. Say there's a mouse named Crumbs (this is a real mouse), and we want to predict whether Crumbs will
Rice’s theorem (a.k.a. computational irreducibility) says that for most algorithms, the only way to figure out what they’ll do with certainty is to run them step-by-step and see.
Rice's theorem says nothing of the sort. Rice's theorem says:
For every semantic property P,
For every program Q that purports to check if an arbitrary program has property P,
There exists a program R such that Q(R) is incorrect:
Either P holds of R but Q(R) returns false,
or P does not hold of R but Q(R) returns true
Notice that the tricky program R
that's causing your
still not have an answer to whether it’s spinning clockwise or counterclockwise
More simply (and quite possibly true), Nobuyuki Kayahara rendered it spinning either clockwise or counterclockwise, lost the source, and has since forgotten which way it was going.
I like “veridical” mildly better for a few reasons, more about pedagogy than anything else.
That's a fine set of reasons! I'll continue to use "accurate" in my head, as I already fully feel that the accuracy of a map depends on which territory you're choosing for it to represent. (And a map can accurately represent multiple territories, as happens a lot with mathematical maps.)
Another reason is I’m trying hard to push for a two-argument usage
...Do you see the Spinning Dancer going clockwise? Sorry, that's not a veridical model of the real-world thing
Why "veridical" instead of simply "accurate"? To me, the accuracy of a map is how well it corresponds to the territory it's trying to map. I've been replacing "veridical" with "accurate" while reading, and it's seemed appropriate everywhere.
...Do you see the Spinning Dancer going clockwise? Sorry, that’s not a veridical model of the real-world thing you’re looking at. [...] after all, nothing in the real world of atoms is rotating in 3D.
Here's a simple argument that simulating universes based on Turing machine number can give manipulated results.
Say we lived in a universe much like this one, except that:
So we send a rocket to the center of the universe and leave a plaque saying "the answer to all your questions is Spongebob". Now any aliens in other universes that simulate our universe and ask "what's in the center of that universe
The feedback is from Lean, which can validate attempted formal proofs.
This is one of the bigger reasons why I really don’t like RLHF—because inevitably you’re going to have to use a whole bunch of Humans who know less-than-ideal amounts about philosophy, pertaining to Ai Alignment.
What would these humans do differently, if they knew about philosophy? Concretely, could you give a few examples of "Here's a completion that should be positively reinforced because it demonstrates correct understanding of language, and here's a completion of the same text that should be negatively reinforced because it demonstrates incorrect un...
As you're probably aware, the fine tuning is done by humans rating the output of the LLM. I believe this was done by paid workers, who were probably given a list of criteria like that it should be helpful and friendly and definitely not use slurs, and who had probably not heard of Wittgenstein. How do you think they would rate LLM outputs that demonstrated "incorrect understanding of language"?
I have (tried to) read Wittgenstein, but don't know what outputs would or would not constitute an "incorrect understanding of language". Could you give some examples? The question is whether the tuners would rate those examples positively or negatively, and whether examples like those would arise during five tuning.
You say "AI", though I'm assuming you're specifically asking about LLMs (large language models) like GPT, Llama, Claude, etc.
LLMs aren't programmed, they're trained. None of the code written by the developers of LLMs has anything to do with concepts, sentences, dictionary definitions, or different languages (e.g. English vs. Spanish). The code only deals with general machine learning, and streams of tokens (which are roughly letters, but encoded a bit differently).
The LLM is trained on huge corpuses of text. The LLM learns concepts, and what a sentence is,...
However, If I already know that I have the disease, and I am not altruistic to my copies, playing such game is a wining move to me?
Correct. But if you don't have the disease, you're probably also not altruistic to your copies, so you would choose not to participate. Leaving the copies of you with the disease isolated and unable to "trade".
Not "almost no gain". My point is that it can be quantified, and it is exactly zero expected gain under all circumstances. You can verify this by drawing out any finite set of worlds containing "mediators", and computing the expected number of disease losses minus disease gains as:
num(people with disease)*P(person with disease meditates)*P(person with disease who meditates loses the disease) - num(people without disease)*P(person without disease meditates)*P(person without disease who meditates gains the disease)
My point is that this number is always exactly zero. If you doubt this, you should try to construct a counterexample with a finite number of worlds.
My point still stands. Try drawing out a specific finite set of worlds and computing the probabilities. (I don't think anything changes when the set of worlds becomes infinite, but the math becomes much harder to get right.)
There is a 0.001 chance that someone who did not have the disease will get it. But he can repeat the procedure.
No, that doesn't work. It invalidates the implicit assumption you're making that the probability that a person chooses to "forget" is independent of whether they have the disease. Ultimately, you're "mixing" the various people who "forgot", and a "mixing" procedure can't change the proportion of people who have the disease.
When you take this into account, the conclusion becomes rather mundane. Some copies of you can gain the disease, while a proportional number lose it.
I think formalizing it in full will be a pretty nontrivial undertaking, but formalizing isolated components feels tractable, and is in fact where I’m currently directing a lot of my time and funding.
Great. Yes, I think that's the thing to do. Start small! I (and presumably others) would update a lot from a new piece of actual formal mathematics from Chris's work. Even if that work was, by itself, not very impressive.
(I would also want to check that that math had something to do with his earlier writings.)
...My current understanding is that he believes that
"gesture at something formal" -- not in the way of the "grammar" it isn't. I've seen rough mathematics and proof sketches, especially around formal grammars. This isn't that, and it isn't trying to be. There isn't even an attempt at a rough definition for which things the grammar derives.
I think Chris’s work is most valuable to engage with for people who have independently explored philosophical directions similar to the ones Chris has explored
...A big part of Chris's preliminary setup is around how to sidestep the issues around making the sets well-ordered
tldr; a spot check calls bullshit on this.
I know a bunch about formal languages (PhD in programming languages), so I did a spot check on the "grammar" described on page 45. It's described as a "generative grammar", though instead of words (sequences of symbols) it produces "L_O spacial relationships". Since he uses these phrases to describe his "grammar", and they have their standard meaning because he listed their standard definition earlier in the section, he is pretty clearly claiming to be making something akin to a formal grammar.
My spot check is then...
How did you always find me?
Have you tried applying your models to predict the day's weather, or what your teacher will be wearing that day? I bet not: they wouldn't work very well. Models have domains in which they're meant to be applied. More precise models tend to have more specific domains.
Making real predictions about something, like what the result of a classroom experiment will be even if the pendulum falls over, is usually outside the domain of any precise model. That's why your successful models are compound models...
"There's no such thing as 'a Bayesian update against the Newtonian mechanics model'!" says a hooded figure from the back of the room. "Updates are relative: if one model loses, it must be because others have won. If all your models lose, it may hint that there's another model you haven't thought of that does better than all of them, or it may simply be that predicting things is hard."
"Try adding a couple more models to compare against. Here's one: pendulums never swing. And here's another: Newtonian mechanics is correct but experiments are hard to perform ...
Are we assuming things are fair or something?
I would have modeled this as von Neumann getting 300 points and putting 260 of them into the maths and sciences and the remaining 40 into living life and being well adjusted.
It's a little hard to tell from the lack of docs, but you're modelling dilemmas with Bayesian networks? I considered that, but wasn't sure how to express Sleeping Beauty nicely, whereas it's easy to express (and gives the right answers) in my tree-shaped dilemmas. Have you tried to express Sleeping Beauty?
And have you tried to express a dilemma like smoking lesion where the action that an agent takes is not the action their decision theory tells them to take? My guess is that this would be as easy as having a chain of two probabilistic events...
I have a healthy fear of death; it's just that none of it stems from an "unobserved endless void". Some of the specific things I fear are:
The point I was making in this thread was that "unobserved endless void" is not on this list, I don't know how to picture it
What's the utility function of the predictor? Is there necessarily a utility function for the predictor such that the predictor's behavior (which is arbitrary) corresponds to maximizing its own utility? (Perhaps this is mentioned in the paper, which I'll look at.)
EDIT: do you mean to reduce a 2-player game to a single-agent decision problem, instead of vice-versa?
I was not aware of Everitt, Leike & Hutter 2015, thank you for the reference! I only delved into decision theory a few weeks ago, so I haven't read that much yet.
Would you say that this is similar to the connection that exists between fixed points and Nash equilibria?
Nash equilibria come from the fact that your action depends on your opponent's action, which depends on your action. When you assume that each player will greedily change their action if it improves their utility, the Nash equilibria are the fixpoints at which no player changes their a...
My solution, which assumes computation is expensive
Ah, so I'm interested in normative decision theory: how one should ideally behave to maximize their own utility. This is what e.g. UDT&FDT are aiming for. (Keep in mind that "your own utility" can, and should, often include other people's utility too.)
Minimizing runtime is not at all a goal. I think the runtime of the decision theories I implemented is something like doubly exponential in the number of steps of the simulation (the number of events in the simulation is exponential in its duration; ea...
Yeah, exactly. For example, if humans had a convention of rounding probabilities to the nearest 10% when writing them, then baseline GPT-4 would follow that convention and it would put a cap on the maximum calibration it could achieve. Humans are badly calibrated (right?) and baseline GPT-4 is mimicking humans, so why is it well calibrated? It doesn't follow from its token stream being well calibrated relative to text.
I like the idea of Peacemakers. I even had the same idea myself---to make an explicitly semi-cooperative game with a goal of maximizing your own score but every player having a different scoring mechanism---but haven't done anything with it.
That said, I think you're underestimating how much cooperation there is in a zero-sum game.
...If you offer a deal, you must be doing it because it increases your chance of winning, but only one person can win under the MostPointsWins rule, so that deal couldn’t be very good for me, and I’ll always suspect your deal of be
Expanding on this, there are several programming languages (Idris, Coq, etc.) whose type system ensures that every program that type checks will halt when it's run. One way to view a type system is as an automated search for a proof that your program is well-typed (and a type error is a counter-example). In a language like Idris or Coq, a program being well-typed implies that it halts. So machine generated proofs that programs halt aren't just theoretically possible, they're used extensively by some languages.
I too gathered people's varied definitions of consciousness for amusement, though I gathered them from the Orange Site:
[The] ability to adapt to environment and select good actions depending on situation, learning from reward/loss signals.
Consciousness is the ability of an organism to predict the future
...The problem is that we want to describe consciousness as "that thing that allows an organism to describe consciousness as 'that thing that allows an organism to describe consciousness
It's a clever scheme, but you can accomplish nearly the same thing by having a physical shutdown switch for the server room and giving a key to N people, and the shutdown switch was never the weak part. Here are some ways to escape your encryption, if the AI can interact with the outside world:
Clarification: pieces can't move "over" the missing squares. Where the words end, the world ends. You cannot move forward in an absence of space.
Woah, woah, slow down. You're talking about the edge cases but have skipped the simple stuff. It sounds like you think it's obvious, or that we're likely to be on the same page, or that it should be inferrable from what you've said? But it's not, so please say it.
Why is growing up so important?
Reading between the lines, are you saying that the only reason that it's bad for a human baby to be in pain is that it will eventually grow into a sapient adult? If so: (i) most people, including myself, both disagree and find that view morally reprehensible, (ii) th...
By far the biggest and most sudden update I've ever had is Dominion, a documentary on animal farming:
It's like... I had a whole pile of interconnected beliefs, and if you pulled on one it would snap most of the way back into place after. And Dominion pushed the whole pile over at once.
Meta comment: I'm going to be blunt. Most of this sequence has been fairly heavily downvoted. That reads to me as this community asking to not have more such content. You should consider not posting, or posting elsewhere, or writing many fewer posts of much higher quality (e.g. spending more time, doing more background research, asking someone to proofread). As a data point, I've only posted a couple times, and I spent at least, I dunno, 10+ hours writing each post. As an example of how this might apply to you, if you wrote this whole sequence as a single "reference on biases" and shared that, I bet it would be better received.
The fact that you so naturally used the word "version" here (it was essentially invisible, it didn't feel like a terminology choice at all) suggests that "version" would be a good term to use instead of "lens". Downside being that it's a sufficiently common word that it doesn't sound like a Term of Art.