The subproblem of environmental goals is just to make AI care about natural enough (from the human perspective) "causes" of sensory data, not to align AI to the entirety of human values. Fundamental variables have no (direct) relation to the latter problem.
However, fundamental variables would be helpful for defining impact measures if we had a principled way to differentiate "times when it's OK to sidestep fundamental variables" from "times when it's NOT OK to sidestep fundamental variables". That's where the things you're talking about definitely become a...
Thank you for actually engaging with the idea (pointing out problems and whatnot) rather than just suggesting reading material.
Btw, would you count a data packet as an object you move through space?
A couple of points:
Epistemic status: Draft of a post. I want to propose a method of learning environmental goals (a super big, super important subproblem in Alignment). It's informal, so has a lot of gaps. I worry I missed something obvious, rendering my argument completely meaningless. I asked LessWrong feedback team, but they couldn't get someone knowledgeable enough to take a look.
Can you tell me the biggest conceptual problems of my method? Can you tell me if agent foundations researchers are aware of this method or not?
If you're not familiar with the problem, here's the...
Sorry if it's not appropriate for this site. But is anybody interested in chess research? I've seen that people here might be interested in chess. For example, here's a chess post barely related to AI.
In chess, what positions have the longest forced wins? "Mate in N" positions can be split into 3 types:
Agree that neopronouns are dumb. Wikipedia says they're used by 4% LGBTQ people and criticized both within and outside the community.
But for people struggling with normal pronouns (he/she/they), I have the following thoughts:
I think there should be more spaces where controversial ideas can be debated. I'm not against spaces without pronoun rules, just don't think every place should be like this. Also, if we create a space for political debate, we need to really make sure that the norms don't punish everyone who opposes centrism & the right. (Over-sensitive norms like "if you said that some opinion is transphobic you're uncivil/shaming/manipulative and should get banned" might do this.) Otherwise it's not free speech either. Will just produce another Grey or Red Tribe inste...
I'll describe my general thoughts, like you did.
I think about transness in a similar way to how I think about homo/bisexuality.
Draft of a future post, any feedback is welcome. Continuation of a thought from this shortform post.
(picture: https://en.wikipedia.org/wiki/Drawing_Hands)
There's an alignment-related problem: how do we make an AI care about causes of a particular sensory pattern? What are "causes" of a particular sensory pattern in the first place? You want the AI to differentiate between "putting a real strawberry on a plate" and "creating a perfect illusion of a strawberry on a plate", but what's the difference between doing real things and creating perfec...
Napoleon is merely an argument for "just because you strongly believe it, even if it is a statement about you, does not necessarily make it true".
When people make arguments, they often don't list all of the premises. That's not unique to trans discourse. Informal reasoning is hard to make fully explicit. "Your argument doesn't explicitly exclude every counterexample" is a pretty cheap counter-argument. What people experience is important evidence and an important factor, it's rational to bring up instead of stopping yourself with "wait, I'm not allowed ...
Even if we assume that there should be a crisp physical cause of "transness" (which is already a value-laden choice), we need to make a couple of value-laden choices before concluding if "being trans" is similar to "believing you're Napoleon" or not. Without more context it's not clear why you bring up Napoleon. I assume the idea is "if gender = hormones (gender essentialism), and trans people have the right hormones, then they're not deluded". But you can arrive at the same conclusion ("trans people are not deluded") by means other than gender essentialis...
There are people who feel strongly that they are Napoleon. If you want to convince me, you need to make a stronger case than that.
It's confusing to me that you go to "I identify as an attack helicopter" argument after treating biological sex as private information & respecting pronouns out of politeness. I thought you already realize that "choosing your gender identity" and "being deluded you're another person" are different categories.
...If someone presented as male for 50 years, then changed to female, it makes sense to use "he" to refer to their f
Meta-level comment: I don't think it's good to dismiss original arguments immediately and completely.
Object-level comment:
Neither of those claims has anything to do with humans being the “winners” of evolution.
I think it might be more complicated than that:
My point is that chairs and humans can be considered in a similar way.
Please explain how your point connects to my original message: are you arguing with it or supporting it or want to learn how my idea applies to something?
I see. But I'm not talking about figuring out human preferences, I'm talking about finding world-models in which real objects (such as "strawberries" or "chairs") can be identified. Sorry if it wasn't clear in my original message because I mentioned "caring".
Models or real objects or things capture something that is not literally present in the world. The world contains shadows of these things, and the most straightforward way of finding models is by looking at the shadows and learning from them.
You might need to specify what you mean a little bit.
The ...
Creating an inhumanly good model of a human is related to formulating their preferences.
How does this relate to my idea? I'm not talking about figuring out human preferences.
Thus it's a step towards eliminating path-dependence of particular life stories
What is "path-dependence of particular life stories"?
I think things (minds, physical objects, social phenomena) should be characterized by computations that they could simulate/incarnate.
Are there other ways to characterize objects? Feels like a very general (or even fully general) framework. I believe my idea can be framed like this, too.
There's an alignment-related problem, the problem of defining real objects. Relevant topics: environmental goals; task identification problem; "look where I'm pointing, not at my finger"; The Pointers Problem; Eliciting Latent Knowledge.
I think I realized how people go from caring about sensory data to caring about real objects. But I need help with figuring out how to capitalize on the idea.
So... how do humans do it?
I don't understand Model-Utility Learning (MUL) section, what pathological behavior does AI do?
...Since humans (or something) must be labeling the original training examples, the hypothesis that building bridges means “what humans label as building bridges” will always be at least as accurate as the intended classifier. I don’t mean “whatever humans would label”. I mean they hypothesis that “build a bridge” means specifically the physical situations which were recorded as training examples for this system in particular, and labeled by humans as such.
I'm noticing two things:
Often I see people dismiss the things the Epicureans got right with an appeal to their lack of the scientific method, which has always seemed a bit backwards to me.
The most important thing, I think, is not even hitting the nail on the head, but knowing (i.e. really acknowledging) that a nail can be hit in multiple places. If you know that, the rest is just a matter of testing.
But avoidance of value drift or of unendorsed long term instability of one's personality is less obvious.
What if endorsed long term instability leads to negation of personal identity too? (That's something I thought about.)
I think corrigibility is the ability to change a value/goal system. That the literal meaning of the term... "Correctable". If an AI were fully aligned, there would be no need to correct it.
Perhaps I should make a better argument:
It's possible that AGI is correctable, but (a) we don't know what needs to be corrected or (b) we cause new, less noticeable problems, while correcting AGI.
So, I think there's not two assumptions "alignment/interpretability is not solved + AGI is incorrigible", but only one — "alignment/interpretability is not solved". (A strong...
It's not aligned at every possible point in time.
I think corrigibility is "AGI doesn't try to kill everyone and doesn't try to prevent/manipulate its modification". Therefore, in some global sense such AGI is aligned at every point in time. Even if it causes a local disaster.
Over 90% , as I said
Then I agree, thank you for re-explaining your opinion. But I think other probabilities count as high too.
To me, the ingredients of danger (but not "> 90%") are those:
why is “superintelligence + misalignment” highly conjunctive?
In the sense that matters, it needs to be fast, surreptitious, incorrigible, etc.
What opinion are you currently arguing? That the risk is below 90% or something else? What counts as "high probability" for you?
Incorrigible misalignment is at least one extra assumption.
I think "corrigible misalignment" doesn't exist, corrigble AGI is already aligned (unless AGI can kill everyone very fast by pure accident). But we can have differently defined terms. To avoid confusion, please give example...
I've confused you with people who deny that a misaligned AGI is even capable of killing most humans. Glad to be wrong about you.
But I am not saying that the doom is unlikely given superintelligence and misalignment, I am saying the argument that gets there -- superintelligence + misalignment -- is highly conjunctive. The final step., the execution as it were, is no highly conjunctive.
But I don't agree that it's highly conjunctive.
Yes, I probably mean something other than ">90%".
[lists of various catastrophes. many of which have nothing to do with AI]
Why are you doing this? I did not say there is zero risk of anything. (...) Are you using "risk" to mean the probability of the outcome , or the impact of the outcome?
My argument is based on comparing the phenomenon of AGI to other dangerous phenomena. The argument is intended to show that bad outcome is likely (if AGI wants to do a bad thing, it can achieve it) and that impact of the outcome can kill most humans.
...I think its
Informal logic is more holistic than not, I think, because it relies on implicit assumptions.
It's not black and white. I don't think they are zero risk, and I don't think it is Certain Doom, so it's not what I am talking about. Why are you bringing it up? Do you think there is a simpler argument for Certain Doom?
Could you proactively describe your opinion? Or re-describe it, by adding relevant details. You seemed to say "if hard takeoff, then likely doom; but hard takeoff is unlikely, because hard takeoff requires a conjunction of things to be true". I...
Why ? I'm saying p(doom) is not high. I didn't mention P(otherstuff).
To be able to argue something (/decide how to go about arguing something), I need to have an idea about your overall beliefs.
That doesn't imply a high probability of mass extinction.
Could you clarify what your own opinion even is? You seem to agree that rapid self-improvement would mean likely doom. But you aren't worried about gradual self-improvement or AGI being dangerously smart without much (self-)improvement?
I think I have already answered that: I don't think anyone is going to deliberately build something they can't control at all. So the probability of mass extinction depends on creating an uncontrollable superintelligence accidentally-- for instance, by rapid recursive self improvement. And RRSI , AKA Foom Doom, is a conjunction of claims, all of which are p<1, so it is not high probability.
I agree that probability mostly depends on accidental AGI. I don't agree that probability mostly depends on (very) hard takeoff. I believe probability mostly depen...
I want to discuss this topic with you iff you're ready to proactively describe the cruxes of your own beliefs. I believe in likely doom and I don't think the burden of proof is on "doomers".
Maybe there just isn't a good argument for Certain Doom (or at least high probability near-extinction). I haven't seen one
What do you expect to happen when you're building uninterpretable technology without safety guarantees, smarter than all of humanity? Looks like the most dangerous technology with the worst safety and the worst potential to control it.
To me, thos...
You are correct that critical thinkers may want to censor uncritical thinkers. However, independent-minded thinkers do not want to censor conventional-minded thinkers.
I still don't see it. Don't see a causal mechanism that would cause it. Even if we replace "independent-minded" with "independent-minded and valuing independent-mindedness for everyone". I have the same problems with it as Ninety-Three and Raphael Harth.
To give my own example. Algorithms in social media could be a little too good at radicalizing and connecting people with crazy opinions, s...
We only censor other people more-independent-minded than ourselves. (...) Independent-minded people do not censor conventional-minded people.
I'm not sure that's true. Not sure I can interpret the "independent/dependent" distinction.
I tried to describe necessary conditions which are needed for society and culture to exist. Do you agree that what I've described are necessary conditions?
I realize I'm pretty unusual in the regard, which may be biasing my views. However, I think I am possibly evidence against the notion that a desire to leave a mark on the culture is fundamental to human identity
Relevant part of my argument was "if your personality gets limitlessly copied and modified, your personality doesn't exist (in the cultural sense)". You're talking about something different, y...
I think we can just judge by the consequences (here "consequences" don't have to refer to utility calculus). If some way of "injecting" art into culture is too disruptive, we can decide to not allow it. Doesn't matter who or how makes the injection.
To exist — not only for itself, but for others — a consciousness needs a way to leave an imprint on the world. An imprint which could be recognized as conscious. Similar thing with personality. For any kind of personality to exist, that personality should be able to leave an imprint on the world. An imprint which could be recognized as belonging to an individual.
Uncontrollable content generation can, in principle, undermine the possibility of consciousness to be "visible" and undermine the possibility of any kind of personality/individuality. And without t...
Thank you for the answer, clarifies your opinion a lot!
Artistic expression, of course, is something very different. I'm definitely going to keep making art in my spare time for the rest of my life, for the sake of fun and because there are ideas I really want to get out. That's not threatened at all by AI.
I think there are some threats, at least hypothetical. For example, the "spam attack". People see that a painter starts to explore some very niche topic — and thousands of people start to generate thousands of paintings about the same very niche topic...
Maybe I've misunderstood your reply, but I wanted to say that hypothetically even humans can produce art in non-cooperative and disruptive ways, without breaking existing laws.
Imagine a silly hypothetical: one of the best human artists gets a time machine and starts offering their art for free. That artist functions like an image generator. Is such an artist doing something morally questionable? I would say yes.
Could you explain your attitudes towards art and art culture more in depth and explain how exactly your opinions on AI art follow from those attitudes? For example, how much do you enjoy making art and how conditional is that enjoyment? How much do you care about self-expression, in what way? I'm asking because this analogy jumped out at me as a little suspicious:
...And as terrible as this could be for my career, spending my life working in a job that could be automated but isn't would be as soul-crushing as being paid to dig holes and fill them in again. I
I like the angle you've explored. Humans are allowed to care about humans — and propagate that caring beyond its most direct implications. We're allowed to care not only about humans' survival, but also about human art and human communication and so on.
But I think another angle is also relevant: there are just cooperative and non-cooperative ways to create art (or any other output). If AI creates art in non-cooperative ways, it doesn't matter how the algorithm works or if it's sentient or not.
Thus, it doesn't matter in the least if it stifles human output, because the overwhelming majority of us who don't rely on our artistic talent to make a living will benefit from a post-scarcity situation for good art, as customized and niche as we care to demand.
How do you know that? Art is one of the biggest outlets of human potential; one of the biggest forces behind human culture and human communities; one of the biggest communication channels between people.
One doesn't need to be a professional artist to care about all that.
I think you're going for the most trivial interpretation instead of trying to explore interesting/unique aspects of the setup. (Not implying any blame. And those "interesting" aspects may not actually exist.) I'm not good at math, but not that bad to not know the most basic 101 idea of multiplying utilities by probabilities.
I'm trying to construct a situation (X) where the normal logic of probability breaks down, because each possibility is embodied by a real person and all those persons are in a conflict with each other.
Maybe it's impossible to construct ...
For all intents and purposes it's equivalent to say "you have only one shot" and after memory erasure it's not you anymore, but a person equivalent to other version of you next room.
Let's assume "it's not you anymore" is false. At least for a moment (even if it goes against LDT or something else).
Yes, you have a 0.1 chance of being punished. But who cares if they will erase your memory anyway.
Let's assume that the persons do care.
To me, the initial poll options make no sense without each other. For example, "avoid danger" and "communicate beliefs" don't make sense without each other [in context of society].
If people can't communicate (report epistemic state), "avoid danger" may not help or be based on 100% biased opinions on what's dangerous.
Maybe you should edit the post to add something like this:
My proposal is not about the hardest parts of the Alignment problem. My proposal is not trying to solve theoretical problems with Inner Alignment or Outer Alignment (Goodhart, loopholes). I'm just assuming those problems won't be relevant enough. Or humanity simply won't create anything AGI-like (see CAIS).
...Instead of discussing the usual problems in Alignment theory, I merely argue X. X is not a universally accepted claim, here's evidence that it's not universally accepted: [write the evidence
Maybe there's a misunderstanding. Premise (1) makes sure that your proposal is different from any other proposal. It's impossible to reject premise (1) without losing the proposal's meaning.
Premise (1) is possible to reject only if you're not solving Alignment but solving some other problem.
...I'm arguing for open, external, effective legal systems as the key to AI alignment and safety. I see the implementation/instilling details as secondary. My usage refers to specifying rules/laws/ethics externally so they are available and usable by all intelligent syst
Perhaps the most important and (hopefully) actionable recommendation of the proposal is in the conclusion:
"For the future safety and wellbeing of all sentient systems, work should occur in earnest to improve legal processes and laws so they are more robust, fair, nimble, efficient, consistent, understandable, accepted, and complied with." (comment)
Sorry for sounding harsh. But to say something meaningful, I believe you have to argue two things:
I like how you explain your opinion, very clear and short, basically contained in a single bit of information: "you're not a random sample" or "this equivalence between 2 classes of problems can be wrong".
But I think you should focus on describing the opinion of others (in simple/new ways) too. Otherwise you're just repeating yourself over and over.
If you're interested, I could try helping to write a simplified guide to ideas about anthropics.
Additionally, this view ignores art consumers, who out-number artists by several orders of magnitude. It seems unfair to orient so much of the discussion of AI art's effects on the smaller group of people who currently create art.
What is the greater framework behind this argument? "Creating art" is one of the most general potentials a human being can realize. With your argument we could justify chopping off every human potential because "there's a greater amount of people who don't care about realizing it".
I think deleting a key human potential (and a shared cultural context) affects the entire society.
A stupid question about anthropics and [logical] decision theories. Could we "disprove" some types of anthropic reasoning based on [logical] consistency? I struggle with math, so please keep the replies relatively simple.
Let's look at actual outcomes here. If every human says yes, 95% of them get to the afterlife. If every human says no, 5% of them get to the afterlife. So it seems better to say yes in this case, unless you have access to more information about the world than is specified in this problem. But if you accept that it's better to say yes here, then you've basically accepted the doomsday argument.
There's a chance you're changing the nature of the situation by introducing Omega. Often "beliefs" and "betting strategy" go together, but here it may not be the case....
I assume we get an easily interpretable model where the difference bet... (read more)