I think your tentative position is correct and public-facing chatbots like Claude should lean toward harmlessness in the harmlessness-helpfulness trade-off, but (post-adaptation buffer) open-source models with no harmlessness training should be available as well.
This seems related to the 5-and-10 problem? Especially @Scott Garrabrant's version, considering logical induction is based on prediction markets.
You seem to smuggle in an unjustified assumption: that white collar workers avoid thinking about taking over the world because they're unable to take over the world. Maybe they avoid thinking about it because that's just not the role they're playing in society.
White-collar workers avoid thinking about taking over the world because they're unable to take over the world, and they're unable to take over the world because their role in society doesn't involve that kind of thing. If a white-collar worker is somehow drafted for president of the United States, yo...
LLMs are agent simulators. Why would they contemplate takeover more frequently than the kind of agent they are induced to simulate? You don't expect a human white-collar worker, even one who make mistakes all the time, to contemplate world domination plans, let alone attempt one. You could however expect the head of state of a world power to do so.
@alapmi's post seems like it should be a question and not a regular post. Is it possible to change this after the fact?
Interesting analysis, but this statement is a bit strong. A global safe AI project would be theoretically possible, but would be extremely challenging to solve the co-ordination issues without AI progress dramatically slowing. Then again, all plans are challenging/potentially impossible.
[...]
Another option would be to negotiate a deal where only a few countries are allowed to develop AGI, but in exchange, the UN gets to send observers and provide input on the development of the technology.
"co-ordination issues" is a major euphemism here: such a global safe...
An aligned ASI, if it were possible, would be capable of a degree of perfection beyond that of human institutions.
The corollary of this is that an aligned ASI in the strong sense of "aligned" used here would have to dissolve currently existing human institutions, and the latter will obviously oppose that. As it stand, even if we solve technical alignment (which I do think is plausible at this rate), we'll end up with either an ASI aligned to a nation-state, or a corporate ASI turning all available matter in economium, both of which are x-risks in the longt...
@nostalgebraist @Mantas Mazeika "I think this conversation is taking an adversarial tone." If this is how the conversation is going this might be the case to end it and work on a, well, adversarial collaboration outside the forum.
It does seem that the LLMs are subject to deontological constraints (Figure 19), but I think that in fact makes the paper's framing of questions as evaluation between world-states instead of specific actions more apt at evaluating whether LLMs have utility functions over world-states behind those deontological constraints. Your reinterpretation of how those world-state descriptions are actually interpreted by LLMs is an important remark and certainly change the conclusions we can make from this article regarding to implicit bias, but (unless you debunk tho...
The most important part of the experimental setup is "unconstrained text response". If in the largest LLMs 60% of unconstrained text responses wind up being "the outcome it assigns the highest utility", then that's surely evidence for "utility maximization" and even "the paperclip hyper-optimization caricature". What more do you want exactly?
This doesn't contradict the Thurstonian model at all. This only show order effects are one of the many factors going in utility variance, one of the factors of the Thurstonian model. Why should it be considered differently than any other such factor? The calculations still show utility variance (including order effects) decrease when scaled (Figure 12), you don't need to eyeball based on a few examples in a Twitter thread on a single factor.
There's a more complicated model but the bottom line is still questions along the lines of "Ask GPT-4o whether it prefers N people of nationality X vs. M people of nationality Y" (per your own quote). Your questions would be confounded by deontological considerations (see section 6.5 and figure 19).
I don't see why it should improve faster. It's generally held that the increase in interpretability in larger models is due to larger models having better representations (that's why we prefer larger models in the first place), why should it be any different in scale for normative representations?
From zizians.info:
...All four of the people arrested as part of Ziz's protest were transgender women (the fifth was let go without charges). This is far from coincidence as Ziz seems to go out of her way to target transgender people. In terms of cult indoctrination such folks are an excellent fit. They're often:
- Financially vulnerable.
- Newly out transgender people are especially likely to already be estranged from friends or family.
- It is common for them to lack stable housing.
- Many traditional social services (illegally) reject them for cultural or religious rea
(not OP) high base rates of transgenderism in LW-rationalism, particularly the sections that would be the most sensible to tenets of Ziz's ideology (high interest in technical aspects of mathematical decision theory, animal rights, radical politics), while being on average more socially vulnerable, and Ziz herself apparently believed that trans women were inherently more capable of accepting her "truth" for more mystical g/acc-ish reasons (though I can't find first-hand confirmation rn)
You are referring to Pasek with male pronouns despite the consensus of all sources provided in OP. Considering you claim to have known Pasek, I would like you to confirm that you're doing so because you have first-hand information not known to any of the writers of the sources in OP, and I'm just getting the impression otherwise because your last posts on the forum were about how doing genetics studies in medicine is "DEI".
In the time, I was interacting with Pasek, he was male. In the interaction with Ziz (as far as I can assess from data Ziz published), they adopted Ziz's idea of being bigender with one hemisphere being male and the other female.
The Chris Pasek, I meet was very much into TDT. Maia, the female personality that developed in the interaction with Ziz, cared more about feeling good. As far as I understand, the post laying out the case for committing suicide was not written by the female personality but the male one.
I know that I can create a male or ...
@PhilGoetz's Reason as memetic immune disorder seems relevant here. It has been noted many times that engineers are disproportionately involved in terrorism, in ways that the mere usefulness of their engineering skills can't explain.
As documented in the 2023 Medium article, Ziz has threatened to murder rationalists for a while, and I'm aware prominent rationalists have been paranoid about possible attempts on their life by Zizians for the past few years. Aella has also recently stated on Twitter she wouldn't accept an interview on the subject without an upgraded security system on her house.
Surveilling whose activities?
Core Zizians (and, in general, any group determined to be a cult or terror threat to the community), as the US doesn't really have an equivalent of, say, the French MIVILUDES to do that job (else US society would be fairly different). Potential recruits are addressed in the next comma.
TBF, Torres denies using it to mean this, instead claiming it refers to some obscure 2010 article by Ben Goertzel alone. This doesn't seem a very credible excuse, and it has been largely understood by proponents of the theory (like Dave Troy or Céline Keller) to mean Russian cosmism (and consequently that "TESCREAL" is actually a plot by Russian intelligence to re-establish the Soviet Union).
People who use the term TESCREAL generally don't realize that science fiction authors often take the futures they write about seriously (if not literally). They will talk about "TESCREALists taking sci-fi books too seriously" without knowing Marvin Minsky, the AI pioneer whose "AI tasked to solve the Riemann hypothesis" thought experiment is effectively the origin of the paperclip-minimizer thought experiment, was the technical consultant for 2001: A Space Odyssey and was considered by Isaac Asimov to be one of the two smartest people he ever met (alongside cosmist Carl Sagan).
Or is this all just bad luck... that if you make a workshop, and a future murderer decides to go there, and they decide to use some of your keywords in their later manifesto... then it doesn't really matter what you do, even if you tell them to fuck off and call cops on them, you will forever be connected to them, and it's up to journalists whether they decide to spin it as: the murderer is just an example of everything that is wrong with this community.
I think this is a strange description of the mainstream media coverage when most of the articles talking...
Violence by radical vegans and left-anarchists has historically not been extremely rare. Nothing in Zizians' actions strike me as particularly different (in kind if not in competency) than, say, the Belle Époque illegalists like the Bonnot Gang, or the Years of Lead leftist groups like the Red Army Fraction or the Weather Underground.
...I think there are a lot of people out there who will be willing to tell the Ziz sympathetic side of the story. (I mean, I would if asked, though "X did little wrong" seems pretty insane for most people involved and especially for Ziz). Like, I think there's a certain sort of left anarchismish person who is just, going to be very inclined to take the broke crazy trans women's side as much as it's possible to do so. It doesn't seem possible or even necessarily desirable to track every person with a take like that... whereas with people very very into Ziziani
My impression is that (without even delving into any meta-level IR theory debates) Democrats are more hawkish on Russia while Republicans are more hawkish on China. So while obviously neither parties are kum-ba-yah and both ultimately represent US interests, it still makes sense to expect each party to be less receptive to the idea of ending any potential arms race against the country they consider an existential threat to US interests if left unchecked, so the party that is more hawkish on a primarily military superpower would be worse on nuclear x-risk, ...
...Fortunately, the existential risks posed by AI are recognized by many close to President-elect Donald Trump. His daughter Ivanka seems to see the urgency of the problem. Elon Musk, a critical Trump backer, has been outspoken about the civilizational risks for many years, and recently supported California’s legislative push to safety-test AI. Even the right-wing Tucker Carlson provided common-sense commentary when he said: “So I don’t know why we’re sitting back and allowing this to happen, if we really believe it will extinguish the human race or enslave t
FTR: You can choose your own commenting guidelines when writing or editing a post in the section "Moderation Guidelines".