All of Matrice Jacobine's Comments + Replies

FTR: You can choose your own commenting guidelines when writing or editing a post in the section "Moderation Guidelines".

[This comment is no longer endorsed by its author]Reply
6habryka
Moderation privileges require passing various karma thresholds (for frontpage posts, it's 2000 karma, for personal posts it's 50).

I think your tentative position is correct and public-facing chatbots like Claude should lean toward harmlessness in the harmlessness-helpfulness trade-off, but (post-adaptation buffer) open-source models with no harmlessness training should be available as well.

This seems related to the 5-and-10 problem? Especially @Scott Garrabrant's version, considering logical induction is based on prediction markets.

You seem to smuggle in an unjustified assumption: that white collar workers avoid thinking about taking over the world because they're unable to take over the world. Maybe they avoid thinking about it because that's just not the role they're playing in society.

White-collar workers avoid thinking about taking over the world because they're unable to take over the world, and they're unable to take over the world because their role in society doesn't involve that kind of thing. If a white-collar worker is somehow drafted for president of the United States, yo... (read more)

Human white-collar workers are unarguably agents in the relevant sense here (intelligent beings with desires and taking actions to fulfil those desires). The fact that they have no ability to take over the world has no bearing on this.

1Ebenezer Dukakis
The sense that's relevant to me is that of "agency by default" as I discussed previously: scheming, sandbagging, deception, and so forth. You seem to smuggle in an unjustified assumption: that white collar workers avoid thinking about taking over the world because they're unable to take over the world. Maybe they avoid thinking about it because that's just not the role they're playing in society. In terms of next-token prediction, a super-powerful LLM told to play a "superintelligent white-collar worker" might simply do the same things that ordinary white-collar workers do, but better and faster. I think the evidence points towards this conclusion, because current LLMs are frequently mistaken, yet rarely try to take over the world. If the only thing blocking the convergent instrumental goal argument was a conclusion on the part of current LLMs that they're incapable of world takeover, one would expect that they would sometimes make the mistake of concluding the opposite, and trying to take over the world anyways. The evidence best fits a world where LLMs are trained in such a way that makes them super-accurate roleplayers. As we add more data and compute, and make them generally more powerful, we should expect the accuracy of the roleplay to increase further -- including, perhaps, improved roleplay for exotic hypotheticals like "a superintelligent white-collar worker who is scrupulously helpful/honest/harmless". That doesn't necessarily lead to scheming, sandbagging, or deception. I'm not aware of any evidence for the thesis that "LLMs only avoid taking over the world because they think they're too weak". Is there any reason at all to believe that they're even contemplating the possibility internally? If not, why would increasing their abilities change things? Of course, clearly they are "strong" enough to be plenty aware of the possibility of world takeover; presumably it appears a lot in their training data. Yet it ~only appears to cross their mind if it would

... do you deny human white-collar workers are agents?

1Ebenezer Dukakis
Agency is not a binary. Many white collar workers are not very "agenty" in the sense of coming up with sophisticated and unexpected plans to trick their boss.

LLMs are agent simulators. Why would they contemplate takeover more frequently than the kind of agent they are induced to simulate? You don't expect a human white-collar worker, even one who make mistakes all the time, to contemplate world domination plans, let alone attempt one. You could however expect the head of state of a world power to do so.

1Ebenezer Dukakis
Maybe not; see OP. Yes, this aligns with my current "agency is not the default" view.
1Lucie Philippon
Yeah last post was two years ago. The Cyborgism and Simulators posts improved my thinking and AI strategy. The void may become one of those key posts for me, and it seems it could have been written much earlier by Janus himself.

I think Janus is closer to "AI safety mainstream" than nostalgebraist?

1Lucie Philippon
AFAIK Janus does not publish posts on LessWrong to detail what he discovered and what it implies for AI Safety strategy.

Uh? The OpenAssistant dataset would qualify as supervised learning/fine-tuning, not RLHF, no?

1ConcurrentSquared
Yeah, it would. Sorry, the post is now corrected.

Would it be worth it to train a series of base models with only data up to year X for different values of X and see the consequences on alignment of derived assistant models?

1ConcurrentSquared
Yes, though note that there is a very good chance that there isn't enough easily accessible and high quality data to create effective pre-2015 LLMs. As you go back in time, exponentially less data is available[1]: ~94 ZBs of digital data was created in 2022, while only ~15.5 ZBs was created in 2015, and only ~2 ZBs was created in 2010. Also, you may run into trouble trying to find conversational datasets not contaminated with post-2022 data. The earliest open dataset for LLM assistant fine-tuning I believe is the first OpenAssistant Conversations Dataset, released 6 months after the launch of ChatGPT. Some form of RHAIF/'unsupervised' assistant fine-tuning is probably a much better choice for this task, but I don't even know if it would work well for this sort of thing. Edit: Apparently Anthropic researchers have just published a paper describing a new form of unsupervised fine-tuning, and it performs well on Alpaca and TruthfulQA - pre-ChatGPT conversational fine-tuning can be done effectively without any time machines. 1. ^ Or without the paywall: https://www.researchgate.net/figure/Worldwide-Data-Created-from-2010-to-2024-Source-https-wwwstatistacom-statistics_fig1_355069187 

@alapmi's post seems like it should be a question and not a regular post. Is it possible to change this after the fact?

1Marius Adrian Nicoară
I think editing should be possible. Not sure about deleting it entirely.

Interesting analysis, but this statement is a bit strong. A global safe AI project would be theoretically possible, but would be extremely challenging to solve the co-ordination issues without AI progress dramatically slowing. Then again, all plans are challenging/potentially impossible.

[...]

Another option would be to negotiate a deal where only a few countries are allowed to develop AGI, but in exchange, the UN gets to send observers and provide input on the development of the technology.

"co-ordination issues" is a major euphemism here: such a global safe... (read more)

An aligned ASI, if it were possible, would be capable of a degree of perfection beyond that of human institutions.

The corollary of this is that an aligned ASI in the strong sense of "aligned" used here would have to dissolve currently existing human institutions, and the latter will obviously oppose that. As it stand, even if we solve technical alignment (which I do think is plausible at this rate), we'll end up with either an ASI aligned to a nation-state, or a corporate ASI turning all available matter in economium, both of which are x-risks in the longt... (read more)

2Chris_Leong
Interesting analysis, but this statement is a bit strong. A global safe AI project would be theoretically possible, but would be extremely challenging to solve the co-ordination issues without AI progress dramatically slowing. Then again, all plans are challenging/potentially impossible. Alternatively, an aligned ASI could be explicitly instructed to preserve existing institutions. Perhaps it'd be limited to providing advice, or (stronger) it wouldn't intervene except by preventing existential or near-existential risks. Yet another possibility is that the world splits into factions which produce their own AGI's and then these AGIs merge. A fourth option would be to negotiate a deal where only a few countries are allowed to develop AGI, but in exchange, the UN gets to send observers and provide input on the development of the technology.

Oh wow that's surprising, I thought Ted Chiang was an AI skeptic not too long ago?

[This comment is no longer endorsed by its author]Reply
5Steven Byrnes
See my other comment. I find it distressing that multiple people here are evidently treating acknowledgements as implying that the acknowledged person endorses the end product. I mean, it might or might be true in this particular case, but the acknowledgement is no evidence either way. (For my part, I’ve taken to using the formula “Thanks to [names] for critical comments on earlier drafts”, in an attempt to preempt this mistake. Not sure if it works.)

@nostalgebraist @Mantas Mazeika "I think this conversation is taking an adversarial tone." If this is how the conversation is going this might be the case to end it and work on a, well, adversarial collaboration outside the forum.

It does seem that the LLMs are subject to deontological constraints (Figure 19), but I think that in fact makes the paper's framing of questions as evaluation between world-states instead of specific actions more apt at evaluating whether LLMs have utility functions over world-states behind those deontological constraints. Your reinterpretation of how those world-state descriptions are actually interpreted by LLMs is an important remark and certainly change the conclusions we can make from this article regarding to implicit bias, but (unless you debunk tho... (read more)

... I don't agree, but would it at least be relevant that the "soft CCP-approved platitudes" are now AI-safetyist?

So that answer your question "Why does the linked article merit our attention?" right?

-2Richard_Kennaway
No. Nothing but soft CCP-approved platitudes can be expected from such a person writing in such a venue. That is her job. China matters, but not everything that it says matters, unless to Pekingologists minutely examining the tea-leaves for insight into whatever is really going on in China. What about my other two points?

Why does the linked article merit our attention?

  • It is written by a Chinese former politician in a Chinese-owned newspaper.

?

2Richard_Kennaway
It is written by a Chinese former politician in a Chinese-owned newspaper.

I'm not convinced "almost all sentient beings on Earth" would pick out of the blue (i.e. without chain of thought) the reflectively optimal option at least 60% of the times when asked unconstrained responses (i.e. not even a MCQ).

The most important part of the experimental setup is "unconstrained text response". If in the largest LLMs 60% of unconstrained text responses wind up being "the outcome it assigns the highest utility", then that's surely evidence for "utility maximization" and even "the paperclip hyper-optimization caricature". What more do you want exactly?

1Archimedes
It's hard to say what is wanted without a good operating definition of "utility maximizer". If the definition is weak enough to include any entity whose responses are mostly consistent across different preference elicitations, then what the paper shows is sufficient. In my opinion, having consistent preferences is just one component of being a "utility maximizer". You also need to show it rationally optimizes its choices to maximize marginal utility. This excludes almost all sentient beings on Earth rather than including almost all of them under the weaker definition.
2Writer
But the "unconstrained text responses" part is still about asking the model for its preferences even if the answers are unconstrained. That just shows that the results of different ways of eliciting its values remain sorta consistent with each other, although I agree it constitutes stronger evidence. Perhaps a more complete test would be to analyze whether its day to day responses to users are somehow consistent with its stated preferences and analyzing its actions in settings in which it can use tools to produce outcomes in very open-ended scenarios that contain stuff that could make the model act on its values.

This doesn't contradict the Thurstonian model at all. This only show order effects are one of the many factors going in utility variance, one of the factors of the Thurstonian model. Why should it be considered differently than any other such factor? The calculations still show utility variance (including order effects) decrease when scaled (Figure 12), you don't need to eyeball based on a few examples in a Twitter thread on a single factor.

If that was the case we wouldn't expect to have those results about the VNM consistency of such preferences.

2Kaj_Sotala
Maybe? We might still have consistent results within this narrow experimental setup, but it's not clear to me that it would generalize outside that setup.

There's a more complicated model but the bottom line is still questions along the lines of "Ask GPT-4o whether it prefers N people of nationality X vs. M people of nationality Y" (per your own quote). Your questions would be confounded by deontological considerations (see section 6.5 and figure 19).

The outputs being shaped by cardinal utilities and not just consistent ordinal utilities would be covered in the "Expected Utility Property" section, if that's your question.

My question is: why do you say "AI outputs are shaped by utility maximization" instead of "AI outputs to simple MC questions are self-consistent"? Do you believe these two things mean the same, or that they are different and you've shown the first and not only the latter?

I don't see why it should improve faster. It's generally held that the increase in interpretability in larger models is due to larger models having better representations (that's why we prefer larger models in the first place), why should it be any different in scale for normative representations?

This interpretation is straightforwardly refuted (insofar as it makes any positivist sense) by the success of the parametric approach in "Internal Utility Representations" being also correlated with model size.

2Gurkenglas
This does go in the direction of refuting it, but they'd still need to argue that linear probes improve with scale faster than they do for other queries; a larger model means there are more possible linear probes to pick the best from.

I think this is largely duplicating Uncle Kenny's already excellent work (linked in the initial thread) and not a good idea.

This is duplicating Uncle Kenny's already very extensive work linked in the OP.

From zizians.info:

All four of the people arrested as part of Ziz's protest were transgender women (the fifth was let go without charges). This is far from coincidence as Ziz seems to go out of her way to target transgender people. In terms of cult indoctrination such folks are an excellent fit. They're often:

  • Financially vulnerable.
    • Newly out transgender people are especially likely to already be estranged from friends or family.
    • It is common for them to lack stable housing.
    • Many traditional social services (illegally) reject them for cultural or religious rea
... (read more)
3Liskantope
Regarding your point about being bigender, I recall that Suri Dao, as I knew them on Tumblr around 2016-2019-ish, identified as bigender, and indeed it was the first time I'd ever heard of the term or concept. I'm not sure I've heard anyone else describe themself that way since, and I never really understood the concept then as Dao tried to explain it or since then either. (The closest I can approximate it to a gender identity I do kind of understand is "genderfluid".) I don't think Dao ever mentioned the hemispheric stuff in connection to it. Is/was this a widespread gender identity among rationalists or even in the wider population that I've been ignorant of? Or is it mainly a concept found among those who subscribe to Ziz's ideas?

I don't really want to go through sinceriously.fyi at this point but it's implicit in her attacks on CFAR as "transphobic" for not accepting her belief system at least.

6Viliam
No specific link either, but if you know the usual "female brain in a male body" explanation, Ziz kinda has a more nuanced version of this, where each brain hemisphere is a separate personality, so you can have e.g. one male and one female hemisphere in a male body. (And "if you don't believe an X person when they interpret their own lived experience, that makes you X-phobic" is a standard woke trope.)

In the largest LW survey, 10.5% of users were transgender. This also increase the most deep in the community you are: 18% restricting to those who are either "sometimes" or "all the times" in the community, 21% restricting to those who are "all the times" in the community.

2Lukas Finnveden
Source? I thought 2016 had the most takers but that one seems to have ~5% trans. The latest one with results out (2023) has 7.5% trans. Are you counting "non-binary" or "other" as well? Or referring to some other survey.
9Viliam
Oh. I somehow missed/forgot that. I guess it makes more sense this way. Like, the more transgenders there are in the community, the smaller the fraction of Zizians among them. With the numbers I originally assumed, Ziz's conversion ratio would be shockingly high. Now it makes more sense. Thank you, this changes my perspective on the situation.

(not OP) high base rates of transgenderism in LW-rationalism, particularly the sections that would be the most sensible to tenets of Ziz's ideology (high interest in technical aspects of mathematical decision theory, animal rights, radical politics), while being on average more socially vulnerable, and Ziz herself apparently believed that trans women were inherently more capable of accepting her "truth" for more mystical g/acc-ish reasons (though I can't find first-hand confirmation rn)

3Mateusz Bagiński
I'm aware of high rates among LWers but it's still far from what we see among Zizians that we hear a lot about. interesting

You are referring to Pasek with male pronouns despite the consensus of all sources provided in OP. Considering you claim to have known Pasek, I would like you to confirm that you're doing so because you have first-hand information not known to any of the writers of the sources in OP, and I'm just getting the impression otherwise because your last posts on the forum were about how doing genetics studies in medicine is "DEI".

[This comment is no longer endorsed by its author]Reply

In the time, I was interacting with Pasek, he was male. In the interaction with Ziz (as far as I can assess from data Ziz published), they adopted Ziz's idea of being bigender with one hemisphere being male and the other female. 

The Chris Pasek, I meet was very much into TDT. Maia, the female personality that developed in the interaction with Ziz, cared more about feeling good. As far as I understand, the post laying out the case for committing suicide was not written by the female personality but the male one. 

I know that I can create a male or ... (read more)

2Viliam
Pasek was a trans woman, Christian mentions in another comment having been in contact with Pasek 8 or 9 years ago, so... could it be the pronouns Pasek actually used back then?

TBF it is fairly striking reading about early Soviet history how many of the Old Bolshevik intelligentsia would have fit right in this community but the whole "Putin is a secret cosmist" crowd is... unhinged.

@PhilGoetz's Reason as memetic immune disorder seems relevant here. It has been noted many times that engineers are disproportionately involved in terrorism, in ways that the mere usefulness of their engineering skills can't explain.

8Viliam
Teaching rationality the shallow way -- nope; knowing about biases can hurt people Teaching rationality the deep way -- nope; reason as a memetic immune disorder :( Perhaps there should be some "pre-rationality" lessons. Something stabilizing you need to learn first, so that learning about rationality does not make you crazy. There are some materials that already seem to point in that direction: adding up to normality, ethical injunctions. Perhaps the CFAR workshops should start with focusing on these things, in a serious way (like, spend at least one day only debating this, check that the participants understood the lesson, and maybe kick out those who didn't?). Because, although some people get damaged by learning about rationality, it seems to me that many people don't (some of them only because they don't change in any significant way, but some of them internalize the lessons in a good way). If we could predict who would end up which way, that could allow us to reduce the damage, while still delivering the value. Of course this only applies to the workshops; online communication is a different questions. But seems to me that the bad things mostly happen offline.

As documented in the 2023 Medium article, Ziz has threatened to murder rationalists for a while, and I'm aware prominent rationalists have been paranoid about possible attempts on their life by Zizians for the past few years. Aella has also recently stated on Twitter she wouldn't accept an interview on the subject without an upgraded security system on her house.

8Friendly Monkey
I see. Well, in case one of them reads this: to the extent it might matter to you, as a fellow radical vegan I request you not kill anyone trying to figure out how to make the first takeover-capable AI be benevolent. (I'm aware that humans are such that it's not obvious that designer-chosen-alignment → benevolence, and that one does actually have to reason about that instead of just assuming the hopeful option, but after having reasoned about it, I think most would choose benevolence in the end.[1] At least, I do not expect important exceptions to be alignment researchers, as opposed to selected-for power-seekers in high positions.) 1. ^ It seems to me that actually reflectively endorsing evil (objectively describing what's happening and then being like "this is good", instead of euphemizing and coping with arguments that often secretly amount to "if you're right then the world would be really bad, and that's scary" like "but nature has it") is rare, and most people are actually altruistic on some level but conformism overrides that and animal abuse is so normal that they usually don't notice it, but then when they see basic information like "cows have best friends and get stressed when they are separated" they seemingly get empathetic and reflective (check the comments).

Surveilling whose activities?

Core Zizians (and, in general, any group determined to be a cult or terror threat to the community), as the US doesn't really have an equivalent of, say, the French MIVILUDES to do that job (else US society would be fairly different). Potential recruits are addressed in the next comma.

TBF, Torres denies using it to mean this, instead claiming it refers to some obscure 2010 article by Ben Goertzel alone. This doesn't seem a very credible excuse, and it has been largely understood by proponents of the theory (like Dave Troy or Céline Keller) to mean Russian cosmism (and consequently that "TESCREAL" is actually a plot by Russian intelligence to re-establish the Soviet Union).

2AprilSR
That is such a bizarre claim to make but admittedly including Cosmism at all is really odd 

People who use the term TESCREAL generally don't realize that science fiction authors often take the futures they write about seriously (if not literally). They will talk about "TESCREALists taking sci-fi books too seriously" without knowing Marvin Minsky, the AI pioneer whose "AI tasked to solve the Riemann hypothesis" thought experiment is effectively the origin of the paperclip-minimizer thought experiment, was the technical consultant for 2001: A Space Odyssey and was considered by Isaac Asimov to be one of the two smartest people he ever met (alongside cosmist Carl Sagan).

Or is this all just bad luck... that if you make a workshop, and a future murderer decides to go there, and they decide to use some of your keywords in their later manifesto... then it doesn't really matter what you do, even if you tell them to fuck off and call cops on them, you will forever be connected to them, and it's up to journalists whether they decide to spin it as: the murderer is just an example of everything that is wrong with this community.

I think this is a strange description of the mainstream media coverage when most of the articles talking... (read more)

2Viliam
Surveilling whose activities? For example, there is a YouTube video with Slimepriestess, who defends Zizians a bit too much, in my opinion. Should we be surveilling Slimepriestess? I chose a specific person and an exaggerated example on purpose. Because, in real situation, it will always be a specific person, and unless the behavior is really bad (which means it is already late), any proposed action will seem excessive to some people. And it will feel virtuous to err on the side of letting people do whatever they want. (And if the police is already looking for Ziz, you don't need to surveil. If you see Ziz, pick up the phone and call the cops, don't try anything heroic, or the next story might be about you.) I think we still haven't reached a consensus on whether Nonlinear are bad guys. That was a year ago. So... yeah, we should do something like that, but it is difficult to get the details right.

Violence by radical vegans and left-anarchists has historically not been extremely rare. Nothing in Zizians' actions strike me as particularly different (in kind if not in competency) than, say, the Belle Époque illegalists like the Bonnot Gang, or the Years of Lead leftist groups like the Red Army Fraction or the Weather Underground.

I think there are a lot of people out there who will be willing to tell the Ziz sympathetic side of the story. (I mean, I would if asked, though "X did little wrong" seems pretty insane for most people involved and especially for Ziz). Like, I think there's a certain sort of left anarchismish person who is just, going to be very inclined to take the broke crazy trans women's side as much as it's possible to do so. It doesn't seem possible or even necessarily desirable to track every person with a take like that... whereas with people very very into Ziziani

... (read more)
4AnonymousAcquaintance
It might be useful to have somewhere a chart of legal names and chosen names so that those of us not on rationalist twitter can keep track. Ivory is the Max who has been charged with Mr. Lind's murder?

My impression is that (without even delving into any meta-level IR theory debates) Democrats are more hawkish on Russia while Republicans are more hawkish on China. So while obviously neither parties are kum-ba-yah and both ultimately represent US interests, it still makes sense to expect each party to be less receptive to the idea of ending any potential arms race against the country they consider an existential threat to US interests if left unchecked, so the party that is more hawkish on a primarily military superpower would be worse on nuclear x-risk, ... (read more)

Fortunately, the existential risks posed by AI are recognized by many close to President-elect Donald Trump. His daughter Ivanka seems to see the urgency of the problem. Elon Musk, a critical Trump backer, has been outspoken about the civilizational risks for many years, and recently supported California’s legislative push to safety-test AI. Even the right-wing Tucker Carlson provided common-sense commentary when he said: “So I don’t know why we’re sitting back and allowing this to happen, if we really believe it will extinguish the human race or enslave t

... (read more)
3otto.barten
I'm aware and I don't disagree. However, in xrisk, many (not all) of those who are most worried are also most bullish about capabilities. Reversely, many (not all) who are not worried are unimpressed with capabilities. Being aware of the concept of AGI, that it may be coming soon, and of how impactful it could be, is in practice often a first step towards becoming concerned about the risks, too. This is not true for everyone unfortunately. Still, I would say that at least for our chances to get an international treaty passed, it is perhaps hopeful that the power of AGI is on the radar of leading politicians (although this may also increase risk through other paths).
Load More