LESSWRONG
LW

All of Q Home's Comments + Replies

Half-baked idea: a straightforward method for learning environmental goals?

So at first I though this didn't include a step where the AI learns to care about things - it only learns to model things. But I think actually you're assuming that we can just directly use the model to pick actions that have predicted good outcomes - which are going to be selected as "good" according the the pre-specified P-properties. This is a flaw because it's leaving too much hard work for the specifiers to do - we want the environment to do way more work at selecting what's "good."

I assume we get an easily interpretable model where the difference bet... (read more)

Half-baked idea: a straightforward method for learning environmental goals?

Q Home25d50

The subproblem of environmental goals is just to make AI care about natural enough (from the human perspective) "causes" of sensory data, not to align AI to the entirety of human values. Fundamental variables have no (direct) relation to the latter problem.

However, fundamental variables would be helpful for defining impact measures if we had a principled way to differentiate "times when it's OK to sidestep fundamental variables" from "times when it's NOT OK to sidestep fundamental variables". That's where the things you're talking about definitely become a... (read more)

1Capybasilisk23d

Thanks. That makes sense.

Half-baked idea: a straightforward method for learning environmental goals?

Q Home25d50

Thank you for actually engaging with the idea (pointing out problems and whatnot) rather than just suggesting reading material.

Btw, would you count a data packet as an object you move through space?

A couple of points:

I only assume AI models the world as "objects" moving through space and time, without restricting what those objects could be. So yes, a data packet might count.
"Fundamental variables" don't have to capture all typical effects of humans on the world, they only need to capture typical human actions which humans themselves can easily perc

... (read more)

1Capybasilisk25d

Ok, that clears things up a lot. However, I still worry that if it's at the AI's discretion when and where to sidestep the fundamental variables, we're back at the regular alignment problem. You have to be reasonably certain what the AI is going to do in extremely out of distribution scenarios.

Q Home's Shortform

Q Home1mo60

Epistemic status: Draft of a post. I want to propose a method of learning environmental goals (a super big, super important subproblem in Alignment). It's informal, so has a lot of gaps. I worry I missed something obvious, rendering my argument completely meaningless. I asked LessWrong feedback team, but they couldn't get someone knowledgeable enough to take a look.

Can you tell me the biggest conceptual problems of my method? Can you tell me if agent foundations researchers are aware of this method or not?

If you're not familiar with the problem, here's the... (read more)

Q Home's Shortform

Q Home2mo20

Sorry if it's not appropriate for this site. But is anybody interested in chess research? I've seen that people here might be interested in chess. For example, here's a chess post barely related to AI.

Intro

In chess, what positions have the longest forced wins? "Mate in N" positions can be split into 3 types:

Positions which use "tricks" to get a big number of moves before checkmate. Such as cycles of repeating moves. For example, this manmade mate in 415 (see the last position) uses obvious cycles. Not to mention mates in omega.
Tablebase checkmates, di

Q Home3mo10

Agree that neopronouns are dumb. Wikipedia says they're used by 4% LGBTQ people and criticized both within and outside the community.

But for people struggling with normal pronouns (he/she/they), I have the following thoughts:

Contorting language to avoid words associated with beliefs... is not easier than using the words. Don't project beliefs onto words too hard.
Contorting language to avoid words associated with beliefs... is still a violation of free speech (if we have such a strong notion of free speech). So what is the motivation to propose that? It'

Q Home3mo00

I think there should be more spaces where controversial ideas can be debated. I'm not against spaces without pronoun rules, just don't think every place should be like this. Also, if we create a space for political debate, we need to really make sure that the norms don't punish everyone who opposes centrism & the right. (Over-sensitive norms like "if you said that some opinion is transphobic you're uncivil/shaming/manipulative and should get banned" might do this.) Otherwise it's not free speech either. Will just produce another Grey or Red Tribe inste... (read more)

4Viliam3mo

Well, the primary goal of this place is to advance rationality and AI safety. Not the victory of any specific political tribe. And neither conformity nor contrarianism for its own sake. Employees get paid, which kinda automatically reduces their free speech, because saying the wrong words can make them stop getting paid. What is an (un)acceptable concession? For me, it is a question of effort and what value I receive in return. I value niceness, so by default people get their wishes granted, unless I forget. Some requests I consider arbitrary and annoying, so they don't get them. Yeah, those are subjective criteria. But I am not here to get paid; I am here to enjoy the talk. (What annoys me: asking to use pronouns other than he/she/they. I do not talk about people's past for no good reason, and definitely not just to annoy someone else. But if I have a good reason to point out that someone did something in the past, and the only way to do that is to reveal their previous name, then I don't care about the taboo.) Employment is really a different situation. You get laws, and recommendations of your legal department; there is not much anyone can do about that. And the rest is about the balance of power, where the individual employee is often in a much worse bargaining position.

Making a conservative case for alignment

Q Home3mo30

I'll describe my general thoughts, like you did.

I think about transness in a similar way to how I think about homo/bisexuality.

If homo/bisexuality is outlawed, people are gonna suffer. Bad.
If I could erase homo/bisexuality from existence without creating suffering, I wouldn't anyway. Would be a big violation of people's freedom to choose their identity and actions (even if in practice most people don't actually "choose" to be homo/bisexual).
Different people have homo/bisexuality of different "strength" and form. One man might fall in love with another

... (read more)

5Viliam3mo

I agree with most of that, but it seems to me that respecting homosexuality is mostly a passive action; if you ignore what other people do, you are already maybe 90% there. Homosexuals don't change their names or pronouns after coming out. You don't have to pretend that ten years ago they were something else than they appeared to you at that time. With transsexuality, you get the taboo of deadnaming, and occasionally the weird pronouns. Also, the reaction seems different when you try to opt out of the game. Like, if someone is uncomfortable with homosexuality, they can say "could we please just... not discuss our sexual relations here, and focus on the job (or some other reason why we are here)?" and that's usually accepted. If someone similarly says "could we please just... call everyone 'they' as a compromise solution, or simply refer to people using their names", that already got some people cancelled. Shortly, with homosexuals I never felt like my free speech was under attack. It is possible that most of the weirdness and pushing boundaries does not actually come from the transsexuals themselves, but rather from woke people who try to be their "allies". Either way, in effect, whenever a discussion about trans topics starts, I feel like "oh my, the woke hordes are coming, people are going to get cancelled". (And I am not really concerned about myself here, because I am not American, so my job is not on the line; and if some online community decides to ban me, well then fuck them. But I don't want to be in a community where people need to watch their tongues, and get filtered by political conformity.)

Q Home's Shortform

Q Home3mo10

Draft of a future post, any feedback is welcome. Continuation of a thought from this shortform post.

(picture: https://en.wikipedia.org/wiki/Drawing_Hands)

The problem

There's an alignment-related problem: how do we make an AI care about causes of a particular sensory pattern? What are "causes" of a particular sensory pattern in the first place? You want the AI to differentiate between "putting a real strawberry on a plate" and "creating a perfect illusion of a strawberry on a plate", but what's the difference between doing real things and creating perfec... (read more)

Making a conservative case for alignment

Q Home3mo10

Napoleon is merely an argument for "just because you strongly believe it, even if it is a statement about you, does not necessarily make it true".

When people make arguments, they often don't list all of the premises. That's not unique to trans discourse. Informal reasoning is hard to make fully explicit. "Your argument doesn't explicitly exclude every counterexample" is a pretty cheap counter-argument. What people experience is important evidence and an important factor, it's rational to bring up instead of stopping yourself with "wait, I'm not allowed ... (read more)

9Viliam3mo

Two major points. 1) It annoys me if someone insists that I accept their theory about what being trans really is. Zack insists that Blanchard is right, and that I fail at rationality if I disagree with him. People on Twitter and Reddit insist that Blanchard is wrong, and that I fail at being a decent human if I disagree with them. My opinion is that I have no comparative advantage at figuring out who is right and who is wrong on this topic, or maybe everyone is wrong, anyway it is an empirical question and I don't have the data. I hope that people who have more data and better education will one day sort it out, but until that happens, my position firmly remains "I don't know (and most likely neither do you), stop bothering me". Also, from larger perspective, this is moving the goalposts. Long ago, tolerance was defined as basically not hurting other people, and letting them do whatever they want as long as it does not hurt others. Recently it also includes agreeing with the beliefs of their woke representatives. (Note that this is about the representatives, not the people being represented. Two trans people can have different opinions, but you are required to believe the woke one and oppose the non-woke one.) Otherwise, you are transphobic. I completely reject that. Furthermore, I claim that even trans people themselves are not necessarily experts on themselves. Science exists for a reason, otherwise we could just make opinion polls. Shortly: disagreement is not hate. But it often gets conflated, especially in environments that overwhelmingly contain people of one political tribe. 2) Every cause gets abused. It is bad if it becomes a taboo to point this out. A few months (or is it already years?) ago, there was an epidemic of teenagers on TikTok who appeared to have developed Tourette syndrome overnight. A few weeks or months later, apparently the epidemic was gone. I have no way to check those teenagers, but I think it is reasonable to assume that many of th

Making a conservative case for alignment

Q Home3mo-3-3

Even if we assume that there should be a crisp physical cause of "transness" (which is already a value-laden choice), we need to make a couple of value-laden choices before concluding if "being trans" is similar to "believing you're Napoleon" or not. Without more context it's not clear why you bring up Napoleon. I assume the idea is "if gender = hormones (gender essentialism), and trans people have the right hormones, then they're not deluded". But you can arrive at the same conclusion ("trans people are not deluded") by means other than gender essentialis... (read more)

4Viliam3mo

Napoleon is merely an argument for "just because you strongly believe it, even if it is a statement about you, does not necessarily make it true". We will probably disagree on this, but the only reason I care about trans issues is that some people report significant suffering (gender dysphoria) from their current situation, and I am in favor of people not suffering, so I generally try not to be an asshole. Unfortunately, for every person who suffers from something, there are probably dozen people out there who cosplay their condition... because it makes them popular on Twitter I guess, or just gives them another opportunity to annoy their neighbors. I have no empathy for those. Play your silly games, if you wish, but don't expect me to play along, and definitely don't threaten me to play along. Also, the cosplayers often make the situation more difficult for those who genuinely have the condition, by speaking in their name, and often saying things that the people who actually have the condition would disagree with... and in the most ironic cases, the cosplayers get them cancelled. So I don't mind being an asshole to the cosplayers, because from my perspective, they started it first. The word "deadnaming" is itself hysterical. (Who died? No one.) Gender essentialism? I don't make any metaphysical claim about essences. People simply are born with male or female bodies (yes, I know that some are intersex), and some people are strongly unhappy about their state. I find it plausible that there may be an underlying biological reason for that; and hormones seem like a likely candidate, because that's how body communicates many things. I don't have a strong opinion on that, because I have never felt a desire to be one sex or the other, just like I have never felt a strong desire to have a certain color of eyes, or hair, or skin, whether it would be the one I have or some that I have not. I expect that you will disagree with a lot of this, and that's okay; I am not tryi

Making a conservative case for alignment

Q Home3mo0-1

There are people who feel strongly that they are Napoleon. If you want to convince me, you need to make a stronger case than that.

It's confusing to me that you go to "I identify as an attack helicopter" argument after treating biological sex as private information & respecting pronouns out of politeness. I thought you already realize that "choosing your gender identity" and "being deluded you're another person" are different categories.

If someone presented as male for 50 years, then changed to female, it makes sense to use "he" to refer to their f

... (read more)

2Viliam3mo

Ah, I disagree, and I don't really wish to discuss the details, so just shortly: * I assume that for trans people being trans is something more than mere "choice" (even if I don't wish to make guesses what exactly, I suspect something with hormones; this is an empirical question for smart people to figure out). If this turns out not to be true, I will probably be annoyed. * If you introduce yourself as "Jane" today, I will refer to you as "Jane". But if 50 years ago you introduced yourself as "John", that is a fact about the past. I am not saying that "you were John" as some kind of metaphysical statement, but that "everyone, including you, referred to you as John" 50 years ago, which is a statement of fact.

Evolution's selection target depends on your weighting

Q Home3mo31

Meta-level comment: I don't think it's good to dismiss original arguments immediately and completely.

Object-level comment:

Neither of those claims has anything to do with humans being the “winners” of evolution.

I think it might be more complicated than that:

We need to define what "a model produced by a reward function" means, otherwise the claims are meaningless. Like, if you made just a single update to the model (based on the reward function), calling it "a model produced by the reward function" is meaningless ('cause no real optimization pressure w

... (read more)

Q Home's Shortform

Q Home3mo10

My point is that chairs and humans can be considered in a similar way.

Please explain how your point connects to my original message: are you arguing with it or supporting it or want to learn how my idea applies to something?

Q Home's Shortform

Q Home3mo10

I see. But I'm not talking about figuring out human preferences, I'm talking about finding world-models in which real objects (such as "strawberries" or "chairs") can be identified. Sorry if it wasn't clear in my original message because I mentioned "caring".

Models or real objects or things capture something that is not literally present in the world. The world contains shadows of these things, and the most straightforward way of finding models is by looking at the shadows and learning from them.

You might need to specify what you mean a little bit.

The ... (read more)

3Vladimir_Nesov3mo

My point is that chairs and humans can be considered in a similar way. There's the world as a whole that generates observations, and particular objects on their own. A model that cares about individual objects needs to consider them separately from the world. The same object in a different world/situation should still make sense, so there are many possibilities for the way an object can be when placed in some context and allowed to develop. This can be useful for modularity, but also for formulating properties of particular objects, in a way that doesn't get distorted by the influence of the rest of the world. Human preferences is one such property.

Q Home's Shortform

Q Home4mo10

Creating an inhumanly good model of a human is related to formulating their preferences.

How does this relate to my idea? I'm not talking about figuring out human preferences.

Thus it's a step towards eliminating path-dependence of particular life stories

What is "path-dependence of particular life stories"?

I think things (minds, physical objects, social phenomena) should be characterized by computations that they could simulate/incarnate.

Are there other ways to characterize objects? Feels like a very general (or even fully general) framework. I believe my idea can be framed like this, too.

2Vladimir_Nesov3mo

Models or real objects or things capture something that is not literally present in the world. The world contains shadows of these things, and the most straightforward way of finding models is by looking at the shadows and learning from them. Hypotheses is another toy example. One of the features of models/things seems to be how they capture the many possibilities of a system simultaneously, rather than isolated particular possibilities. So what I gestured at was that when considering models of humans, the real objects or models behind a human capture the many possibilities of the way that human could be, rather than only the actuality of how they actually are. And this seems useful for figuring out their preferences. Path-dependence is the way outcomes depend on the path that was taken to reach them. A path-independent outcome is convergent, it's always the same destination regardless of the path that was taken. Human preferences seem to be path dependent on human timescales, growing up in Egypt may lead to a persistently different mindset from the same human growing up in Canada.

Q Home's Shortform

Q Home4mo*52

There's an alignment-related problem, the problem of defining real objects. Relevant topics: environmental goals; task identification problem; "look where I'm pointing, not at my finger"; The Pointers Problem; Eliciting Latent Knowledge.

I think I realized how people go from caring about sensory data to caring about real objects. But I need help with figuring out how to capitalize on the idea.

So... how do humans do it?

Humans create very small models for predicting very small/basic aspects of sensory input (mini-models).
Humans use mini-models as puzzle

... (read more)

2Vladimir_Nesov4mo

Creating an inhumanly good model of a human is related to formulating their preferences. A model captures many possibilities and the way many hypothetical things are simulated in the training data. Thus it's a step towards eliminating path-dependence of particular life stories (and preferences they motivate), by considering these possibilities altogether. Even if some on the possible life stories interact with distortionary influences, others remain untouched, and so must continue deciding their own path, for there are no external influences there and they are the final authority for what counts as aiding them anyway.

6[anonymous]4mo

Another highly relevant post: The Pointers Problem.

Stable Pointers to Value II: Environmental Goals

Q Home4mo10

I don't understand Model-Utility Learning (MUL) section, what pathological behavior does AI do?

Since humans (or something) must be labeling the original training examples, the hypothesis that building bridges means “what humans label as building bridges” will always be at least as accurate as the intended classifier. I don’t mean “whatever humans would label”. I mean they hypothesis that “build a bridge” means specifically the physical situations which were recorded as training examples for this system in particular, and labeled by humans as such.

... (read more)

Being nicer than Clippy

Q Home10mo20

I'm noticing two things:

It's suspicious to me that values of humans-who-like-paperclips are inherently tied to acquiring an unlimited amount of resources (no matter in which way). Maybe I don't treat such values as 100% innocent, so I'm OK keeping them in check. Though we can come up with thought experiments where the urge to get more resources is justified by something. Like, maybe instead of producing paperclips those people want to calculate Busy Beaver numbers, so they want more and more computronium for that.
How consensual were the trades if their outcome is predictable and other groups of people don't agree with the outcome? Looks like coercion.

Examples of Highly Counterfactual Discoveries?

Q Home10mo20

Often I see people dismiss the things the Epicureans got right with an appeal to their lack of the scientific method, which has always seemed a bit backwards to me.

The most important thing, I think, is not even hitting the nail on the head, but knowing (i.e. really acknowledging) that a nail can be hit in multiple places. If you know that, the rest is just a matter of testing.

6Self10mo

~Don't aim for the correct solution, (first) aim for understanding the space of possible solutions