Can this be summarized as "don't optimize for what you believe is good too hard, as you might be mistaken about what is good"?
It was both, in the system prompt the model was instructed to end the conversation if in disagreement with the user. You could also ask it to end the conversation. It would presumably send an end-of-conversation token. Which then made the text box disappear.
i kinda thought that ey's anti-philosophy stance was a bit extreme but this is blackpilling me pretty hard lmao
He actually cites reflective equilibrium here:
Closest antecedents in academic metaethics are Rawls and Goodman's reflective equilibrium, Harsanyi and Railton's ideal advisor theories, and Frank Jackson's moral functionalism.
If Thurston is right here and mathematicians want to understand why some theorem is true (rather than to just know the truth values of various conjectures), and if we "feel the AGI" ... then it seems future "mathematics" will consist in "mathematicians" asking future ChatGPT to explain math to them. Whether something is true, and why. There would be no research anymore.
The interesting question is, I think, whether less-than-fully-general systems, like reasoning LLMs, could outperform humans in mathematical research. Or whether this would require a full AGI that is also smarter than mathematicians. Because if we had the latter, it would likely be an ASI that is better than humans in almost everything, not just mathematics.
I think when people use the term "gradual disempowerment" predominantly in one sense, people will also tend to understand it in that sense. And I think that sense will be rather literal and not the one specifically of the original authors. Compare the term "infohazard" which is used differently (see comments here) from how Yudkowsky was using it.
Unrelated to vagueness they can also just change the framework again at any time.
Reminds me of Schopenhauer's posthumously published manuscript The Art of Being Right: 38 Ways to Win an Argument.
In Richard Jeffrey's utility theory there is actually a very natural distinction between positive and negative motivations/desires. A plausible axiom is (the tautology has zero desirability: you already know it's true). Which implies with the main axiom[1] that the negation of any proposition with positive utility has negative utility, and vice versa. Which is intuitive: If something is good, its negation is bad, and the other way round. In particular, if (indifference between and ), then .
More generally, ...
conducive to well-being
That in itself isn't a good definition , because it doesn't distinguish ethics from, e.g. Medicine...and it doesn't tell you whose well being. De facto people are ethically obliged to do things which against their well being and refrain from doing some things which promote their own wellbeing...I can't rob people to pay my medical bills.
Promoting your own well-being only would be egoism, while ethics seems to be more similar to altruism.
Whose desires?
I guess of all beings that are conscious. Perhaps relative to their degr...
Many attempts at establishing an objective morality try to argue from considerations of human well-being. OK, but who decided that human well-being is what is important? We did!
That's a rather minimal amount of subjectivism. Everything downstream of that can be objective , so its really a compromise position
It's also possible (and I think very probable) that "ethical" means something like "conducive to well-being". Similar to how "tree" means something like "plant with a central wooden trunk". Imagine someone objecting: "OK, but who decided that tr...
That's some careful analysis!
Two remarks:
"Can" is the opposite of "unable". "Unable" means that the change involves granting ability to they who would act, i.e. teaching a technique, providing a tool, fixing the body, or altering the environment.
That's a good characterization, though arguably not a definition, as it relies on "ability", which is circular. I can do something = I have the ability to do something. I can = I'm able to.
But we can use the initial principle (it really needs a name) which doesn't mention ability:
...You do a thing iff you can
Your headline overstates the results. The last common ancestor of birds an mammals probably wasn't exactly unintelligent. (In contrast to our last common ancestor with the octopus, as the article discusses.)
"the" supposes there's exactly one canonical choice for what object in the context is indicated by the predicate. When you say "the cat" there's basically always a specific cat from context you're talking about. "The cat is in the garden" is different from "There's exactly one cat in the garden".
Yes, we have a presupposition that there is exactly one cat. But that presupposition is the same regardless of the actual number of cats (regardless of the context), because the "context" here is a feature of the external world ("territory"), while the belief is...
What I was saying was that we can, from our subjective perspective, only "point" to or "refer" to objects in a certain way. In terms of predicate logic the two ways of referring are via a) individual constants and b) variable quantification. The first corresponds to direct reference, where the reference always points to exactly one object. Mental objects can presumably be referred to directly. For other objects, like physical ones, quantifiers have to be used. Like "at least one" or "the" (the latter only presupposes there is exactly one object satisfying ...
I think object identification is important if we want to analyze beliefs instead of sentences. For beliefs we can't take a third person perspective and say "it's clear from context what is meant". Only the agent knows what he means when he has a belief (or she). So the agent has to have a subjective ability to identify things. For "I" this is unproblematic, because the agent is presumably internal and accessible to himself and therefore can be subjectively referred to directly. But for "this" (and arguably also for terms like "tomorrow") the referred objec...
Yeah. I proposed a while ago that all the AI content was becoming so dominant that it should be hived off to the Alignment Forum while LessWrong is for all the rest. This was rejected.
Maybe I missed it, but what about indexical terms like "I", "this", "now"?
There is still the possibility on the front page to filter out the AI tag completely.
That difference is rather extreme. It seems LLM companies have a strong winner-take-all market tendency. Similar to Google (web search) or Amazon (online retail) in the past. It seems now much more likely to me that ChatGPT has basically already won the LLM race, similar to how Google won the search engine race in the past. Gemini outperforming ChatGPT in a few benchmarks likely won't make a difference.
[...] because it is embedded natively, deep in the architecture of our omnimodal GPT‑4o model, 4o image generation can use everything it knows to apply these capabilities in subtle and expressive ways [...] Unlike DALL·E, which operates as a diffusion model, 4o image generation is an autoregressive model natively embedded within ChatGPT.
To operationalise this: a decision theory usually assumes that you have some number of options, each with some defined payout. Assuming payouts are fixed, all decision theories simply advise you to pick the outcome with the highest utility.
The theories typically assume that each choice option has a number of known mutually exclusive (and jointly exhaustive) possible outcomes. And to each outcome the agent assigns a utility and a probability. So uncertainty is in fact modelled insofar the agent can assign subjective probabilities to those outcomes occurr...
(This is off-topic but I'm not keen on calling LLMs "he" or "she". Grok is not a man, nor a woman. We shouldn't anthropomorphize language models. We already have an appropriate pronoun for those: "it")
There is also Deliberation in Latent Space via Differentiable Cache Augmentation by Liu et al. and Efficient Reasoning with Hidden Thinking by Shen et al.
I think picking axioms is not necessary here and in any case inconsequential.
By picking your axioms you logically pinpoint what you are talking in the first place. Have you read Highly Advanced Epistemology 101 for Beginners? I'm noticing that our inferential distance is larger than it should be otherwise.
I have read it a while ago, but he overstates the importance of axiom systems. E.g. he wrote:
...You need axioms to pin down a mathematical universe before you can talk about it in the first place. The axioms are pinning down what the heck this 'NUM
I wouldn't generally dismiss an "embarassing & confusing public meltdown" when it comes from a genius. Because I'm not a genius while he or she is. So it's probably me who is wrong rather than him. Well, except the majority of comparable geniuses agrees with me rather than with him. Though geniuses are rare, and majorities are hard to come by. I still remember an (at the time) "embarrassing and confusing meltdown" by some genius.
My point is that if your picking of particular axioms is entangled with reality, then you are already using a map to describe some territory. And then you can just as well describe this territory more accurately.
I think picking axioms is not necessary here and in any case inconsequential. "Bachelors are unmarried" is true whether or not I regard it as some kind of axiom or not. I seems the same holds for tautologies and probabilistic laws. Moreover, I think neither of them is really "entangled" with reality, in the sense that they are compatible with an...
Do you really have access to the GPT-4 base (foundation) model? Why? It's not publicly available.
Yes, the meaning of a statement depends causally on empirical facts. But this doesn't imply that the truth value of "Bachelors are unmarried" depends less than completely on its meaning. Its meaning (M) screens off the empirical facts (E) and its truth value (T). The causal graph looks like this:
E —> M —> T
If this graph is faithful, it follows that E and T are conditionally independent given M. . So if you know M, E gives you no additional information about T.
And the same is the case for all "analytic" statements, where the truth value only d...
It seems clear to me that statements expressing logical or probabilistic laws like or are "analytic". Similar to "Bachelors are unmarried".
The truth of a statement in general is determined by two things, it's meaning and what the world is like. But for some statements the latter part is irrelevant, and their meanings alone are sufficient to determine their truth or falsity.
Not to remove all limitations: I think the probability axioms are a sort of "logic of sets of beliefs". If the axioms are violated the belief set seems to be irrational. (Or at least the smallest incoherent subset that, if removed, would make the set coherent.) Conventional logic doesn't work as a logic for belief sets, as the preface and lottery paradox show, but subjective probability theory does work. As a justification for the axioms: that seems a similar problem to justifying the tautologies / inference rules of classical logic. Maybe an instrumental ...
Well, technically P(Ω)=1 is an axiom, so you do need a sample space if you want to adhere to the axioms.
For a propositional theory this axiom is replaced with , i.e. a tautology in classical propositional logic receives probability 1.
But sure, if you do not care about accurate beliefs and systematic ways to arrive to them at all, then the question is, indeed, not interesting. Of course then it's not clear what use is probability theory for you, in the first place.
Degrees of belief adhering to the probability calculus at any point in time rules...
And how would you know which worlds are possible and which are not?
Yes, that's why I only said "less arbitrary".
Regarding "knowing": In subjective probability theory, the probability over the "event" space is just about what you believe, not about what you know. You could theoretically believe to degree 0 in the propositions "the die comes up 6" or "the die lands at an angle". Or that the die comes up as both 1 and 2 with some positive probability. There is no requirement that your degrees of belief are accurate relative to some external standard. It is...
A less arbitrary way to define a sample space is to take the set of all possible worlds. Each event, e.g. a die roll, corresponds to the disjunction of possible worlds where that event happens. The possible worlds can differ in a lot of tiny details, e.g. the exact position of a die on the table. Even just an atom being different at the other end of the galaxy would constitute a different possible world. A possible world is a maximally specific way the world could be. So two possible worlds are always mutually exclusive. And the set of all possible worlds ...
I think the main problem from this evolutionary perspective is not so much entertainment and art, but low fertility. Not having children.
A drug that fixes akrasia without major side-effects would indeed be the Holy Grail. Unfortunately I don't think caffeine does anything of that sort. For me it increases focus, but it doesn't combat weakness of will, avoidance behavior, ugh fields. I don't know about other existing drugs.
I think the main reason is that until a few years ago, not much AI research came out of China. Gwern highlighted this repeatedly.
I agree with the downvoters that the thesis of this post seems crazy. But aren't entertainment and art superstimuli? Aren't they forms of wireheading?
Hedonic and desire theories are perfectly standard, we had plenty of people talking about them here, including myself. Jeffrey's utility theory is explicitly meant to model (beliefs and) desires. Both are also often discussed in ethics, including over at the EA Forum. Daniel Kahneman has written about hedonic utility. To equate money with utility is a common simplification in many economic contexts, where expected utility is actually calculated, e.g. when talking about bets and gambles. Even though it isn't held to be perfectly accurate. I didn't encounter...
A more ambitious task would be to come up with a model that is more sophisticated than decision theory, one which tries to formalize your previous comment about intent and prediction/belief.
Interesting. This reminds me of a related thought I had: Why do models with differential equations work so often in physics but so rarely in other empirical sciences? Perhaps physics simply is "the differential equation science".
Which is also related to the frequently expressed opinion that philosophy makes little progress because everything that gets developed enough to make significant progress splits off from philosophy. Because philosophy is "the study of ill-defined and intractable problems".
Not saying that I think these views are accurate, though they do have some plausibility.
It seems to be only "deception" if the parent tries to conceal the fact that he or she is simplifying things.
There is also the related problem of intelligence being negatively correlated with fertility, which leads to a dysgenic trend. Even if preventing people below a certain level of intelligence to have children was realistically possible, it would make another problem more severe: the fertility of smarter people is far below replacement, leading to quickly shrinking populations. Though fertility is likely partially heritable, and would go up again after some generations, once the descendants of the (currently rare) high-fertility people start to dominate.
This seems to be a relatively balanced article which discusses serveral concepts of utility with a focus on their problems, while acknowledging some of their use cases. I don't think the downvotes are justified.
That's an interesting perspective. Only it doesn't seem fit into the simplified but neat picture of decision theory. There everything is sharply divided between being either a statement we can make true at will (an action we can currently decide to perform) and to which we therefore do not need to assign any probability (have a belief about it happening), or an outcome, which we can't make true directly, that is at most a consequence of our action. We can assign probabilities to outcomes, conditional on our available actions, and a value, which lets us com...
Maybe this is avoided by KV caching?
This is not how many decisions feel to me - many decisions are exactly a belief (complete with bayesean uncertainty). A belief in future action, to be sure, but it's distinct in time from the action itself.
But if you only have a belief that you will do something in the future, you still have to decide, when the time comes, whether to carry out the action or not. So your previous belief doesn't seem to be an actual decision, but rather just a belief about a future decision -- about which action you will pick in the future.
See Spohn's example about belie...
Decision screens off thought from action. When you really make a decision, that is the end of the matter, and the actions to carry it out flow inexorably.
Yes, but that arguably means we only make decisions about which things to do now. Because we can't force our future selves to follow through, to inexorably carry out something. See here:
...Our past selves can't simply force us to do certain things, the memory of a past "commitment" is only one factor that may influence our present decision making, but it doesn't replace a decision. Otherwise, always whe
Dreams exhibit many incoherencies. You can notice them and become "lucid". Video games are also incoherent. They don't obey some simple but extremely computationally demanding laws. They instead obey complicated laws that are not very computationally demanding. They cheat with physics for efficiency reasons, and those cheats are very obvious. Our real physics, however, hasn't uncovered such apparent cheats. Physics doesn't seem incoherent, it doesn't resemble a video game or a dream.