LESSWRONG
LW

All of iceman's Comments + Replies

Arguments for optimism on AI Alignment (I don't endorse this version, will reupload a new version soon.)

But POC||GTFO is really important to constraining your expectations. We do not really worry about Rowhammer since the few POCs are hard, slow and impractical. We worry about Meltdown and other speculative execution attacks because Meltdown shipped with a POC that read passwords from a password manager in a different process, was exploitable from within Chrome's sandbox, and my understanding is that POCs like that were the only reason Intel was made to take it seriously.

Meanwhile, Rowhammer is maybe a real issue but is so hard to pull off consistently and s... (read more)

2niplav1y

I'm confused about how POC||GTFO fits together with cryptographers starting to worry about post-quantum cryptography already in 2006, when the proof of concept was we have factored 15 into 3×5 using Shor's algorithm? (They were running a whole conference on it!)

Arguments for optimism on AI Alignment (I don't endorse this version, will reupload a new version soon.)

iceman1y184

On the topic of security mindset, the thing that the LW community calls "security mindset" isn't even an accurate rendition of what computer security people would call security mindset. As noted by lc, actual computer security mindset is POC || GTFO, or trying to translate that into lesswrongesse, you do not have warrant to believe in something until you have an example of the thing you're maybe worried about being a real problem because you are almost certain to be privileging the hypothesis.

lc1y142

POC || GTFO is not "security mindset", it's a norm. It's like science in that it's a social technology for making legible intellectual progress on engineering issues, and allows the field to parse who is claiming to notice security issues to signal how smart they are vs. who is identifying actual bugs. But a lack of "POC || GTFO" culture doesn't tell you that nothing is wrong, and demanding POCs for everything obviously doesn't mean you understand what is and isn't secure. Or to translate that into lesswrongese, reversed stupidity is not intelligence.

Zach Furman1y1813

In the cybersecurity analogy, it seems like there are two distinct scenarios being conflated here:

1) Person A says to Person B, "I think your software has X vulnerability in it." Person B says, "This is a highly specific scenario, and I suspect you don't have enough evidence to come to that conclusion. In a world where X vulnerability exists, you should be able to come up with a proof-of-concept, so do that and come back to me."

2) Person B says to Person A, "Given XYZ reasoning, my software almost certainly has no critical vulnerabilities of any kind. I'm ... (read more)

Steven Byrnes1y169

At the very least I think it would be more accurate to say “one aspect of actual computer security mindset is POC || GTFO”. Right? Are you really arguing that there’s nothing more to it than that?? That seems insane to me.

Even leaving that aside, here’s a random bug thread:

Mozilla developers identified and fixed several stability bugs in the browser engine used in Firefox and other Mozilla-based products. Some of these crashes showed evidence of memory corruption under certain circumstances and we presume that with enough effort at least some of these coul

... (read more)

Daniel Kokotajlo1y104

Citation needed? The one computer security person I know who read Yudkowsky's post said it was a good description of security mindset. POC||GTFO sounds useful and important too but I doubt it's the core of the concept.

Also, if the toy models, baby-AGI-setups like AutoGPT, and historical examples we've provided so far don't meet your standards for "example of the thing you're maybe worried about" with respect to AGI risk, (and you think that we should GTFO until we have an example that meets your standards) then your standards are way too high.

If instead PO... (read more)

AI romantic partners will harm society if they go unregulated

iceman1y4718

Are AI partners really good for their users?

Compared to what alternative?

As other commenters have pointed out, the baseline is already horrific for men, who are suffering. Your comments in the replies seem to reject that these men are suffering. No, obviously they are.

But responding in depth would just be piling on and boring, so instead let's say something new:

I think it would be prudent to immediately prohibit AI romance startups to onboard new users[..]

You do not seem to understand the state of the game board: AI romance startups are dead, and we... (read more)

-4Roman Leventov1y

I didn't deny that some people actually suffer (probably big portion of them are clinically depressed, though, so would "qualify" to use AI partners before 30 in my proposal), I just said that it's by no means normal to suffer if you just don't have a romantic partner, but otherwise your life is "ok". See this comment. This perverted strategy of ameliorating symptoms of problems (such as social media, problems with (sexual) self-image and expectations, "dating market", social isolation, etc.) because it will provide a constant stream of $$$, instead of treating the root causes, is what bugs me. I'm far from convinced that "society" (represented by payment providers) already decided to ban sex bots. It doesn't make much sense, given that payment providers do serve porn industry, OnlyFans, etc. Maybe the issue was that character.AI didn't clearly market itself as "adult startup" and didn't impose age restrictions, and providers saw potential legal risks with this? I see https://www.evaapp.ai/ is growing, https://caryn.ai/ is growing, they aren't getting banned. I generally worry much less about hardcore enthusiasts to develop and host open-source waifus, of course fundamentally this cannot be banned, but the potential reach of these will be an order or two orders of magnitude smaller than easy-to-use mobile app. I don't worry about society-scale effects from people using open-source waifus.

A Hill of Validity in Defense of Meaning

iceman2y14-1

So, I started off with the idea that Ziz's claims about MIRI were frankly crazy...because Ziz was pretty clearly crazy (see their entire theory of hemispheres, "collapse the timeline," etc.) so I marked most of their claims as delusions or manipulations and moved on, especially since their recounting of other events on the page where they talked about miricult (which is linked in OP) comes off as completely unhinged.

But Zack confirming this meeting happened and vaguely confirming its contents completely changes all the probabilities. I now need to go back ... (read more)

lc2y154

It's obviously not defamation since Ziz believes its true.

We're veering dangerously close into dramaposting here, but just FYI habyka has already contested that they ever said this. I would like to know if the ban accusations are true, though.

-41green_leaf2y

A Hill of Validity in Defense of Meaning

iceman2y13-28

The second half (just live off donations?) is also my interpretation of OP. The first half (workable alignment plan?) is my own intuition based on MIRI mostly not accomplishing anything of note over the last decade, and...

MIRI & company spent a decade working on decision theory which seems irrelevant if deep learning is the path (aside: and how would you face Omega if you were the sort of agent that pays out blackmail?). Yudkowsky offers to bet Demis Hassabis that Go won't be solved in the short term. They predict that AI will only come from GOFAI AIX... (read more)

7jimrandomh2y

This doesn't seem consistent to me with MIRI having run a research program with a machine learning focus. IIRC (I don't have links handy but I'm pretty sure there were announcements made) that they wound up declaring failure on that research program, and it was only after that happened that they started talking about the world being doomed and there not being anything that seemed like it would work for aligning AGI in time.

jimrandomh2y6242

Deep Learning systems don't look like they FOOM. Stochastic Gradient Descent doesn't look like it will treacherous turn.

I think you've updated incorrectly, by failing to keep track of what the advance predictions were (or would have been) about when a FOOM or a treacherous turn will happen.

If foom happens, it happens no earlier than the point where AI systems can do software-development on their own codebases, without relying on close collaboration with a skilled human programmer. This point has not yet been reached; they're idiot-savants with skill gaps t... (read more)

4Said Achmiz2y

Incidentally, I don’t think I’m willing to trust a hearsay report on this without confirmation. Do you happen to have any links to Eliezer making such a claim in public? Or, at least, any confirmation that the cited comment was made as described?

Said Achmiz2y358

They predict that AI will only come from GOFAI AIXI-likes with utility functions that will bootstrap recursively.

Do you have a link for this prediction? (Or are you just referring to, e.g., Eliezer’s dismissive attitude toward neural networks, as expressed in the Sequences?)

They predict fast takeoff and FOOM. … Deep Learning systems don’t look like they FOOM.

It’s not clear that deep learning systems get us to AGI, either. There doesn’t seem to be any good reason to be sure, at this time, that we won’t get “fast takeoff and FOOM”, does it? (Indeed it... (read more)

FireStormOOO2y146

It's pretty easy to find reasons why everything will hopefully be fine, or AI hopefully won't FOOM, or we otherwise needn't do anything inconvenient to get good outcomes. It's proving considerably harder (from my outside the field view) to prove alignment, or prove upper bounds on rate of improvement, or prove much of anything else that would be cause to stop ringing the alarm.

FWIW I'm considerably less worried than I was when the Sequences were originally written. The paradigms that have taken off since do seem a lot more compatible with strai... (read more)

A Hill of Validity in Defense of Meaning

iceman2y2-14

It's not exactly the point of your story, but...

Probably the most ultimately consequential part of this meeting was Michael verbally confirming to Ziz that MIRI had settled with a disgruntled former employee, Louie Helm, who had put up a website slandering them.

Wait, that actually happened? Louie Helm really was behind MIRICult? The accusations weren't just...Ziz being Ziz? And presumably Louie got paid out since why would you pay for silence if the accusations weren't at least partially true...or if someone were to go digging, they'd find things even ... (read more)

4Eli Tyre1y

FWIW, my current understanding is that this inference isn't correct. I think it's common practice to pay settlements to people, even if their claims are fallacious, since having an extended court battle is sometimes way worse.

habryka2y322

Wait, that actually happened? Louie Helm really was behind MIRICult? The accusations weren't just...Ziz being Ziz? And presumably Louie got paid out since why would you pay for silence if the accusations weren't at least partially true...or if someone were to go digging, they'd find things even more damning?

Louie Helm was behind MIRICult (I think as a result of some dispute where he asked for his job back after he had left MIRI and MIRI didn't want to give him his job back). As far as I can piece together from talking to people, he did not get paid out, bu... (read more)

Said Achmiz2y229

So Yudkowsky doesn’t have a workable alignment plan, so he decided to just live off our donations, running out the clock.

Er… is anyone actually claiming this? This is quite the accusation, and if it were being made, I’d want to see some serious evidence, but… is it, in fact, being made?

(It does seem like OP is saying this, but… in a weird way that doesn’t seem to acknowledge the magnitude of the accusation, and treats it as a reasonable characterization of other claims made earlier in the post. But that doesn’t actually seem to make sense. Am I misreading, or what?)

Some reasons to not say "Doomer"

iceman2y20

Just to check, has anyone actually done that?

I'm thinking of a specific recent episode where [i can't remember if it was AI Safety Memes or Connor Leahy's twitter account] posted a big meme about AI Risk Deniers and this really triggered Alexandros Marinos. (I tried to use Twitter search to find this again, but couldn't.)

It's quite commonly used by a bunch of people at Constellation, Open Philanthropy and some adjacent spaces in Berkeley.

Fascinating. I was unaware it was used IRL. From the Twitter user viewpoint, my sense is that it's mostly used by people who don't believe in the AI risk narrative as a pejorative.

Some reasons to not say "Doomer"

iceman2y1210

Why are you posting this here? My model is that the people you want to convince aren't on LessWrong and that you should be trying to argue this on Twitter; you included screenshots from that site, after all.

(My model of the AI critics would be that they'd shrug and say "you started it by calling us AI Risk Deniers.")

habryka2y124

you started it by calling us AI Risk Deniers.

Just to check, has anyone actually done that? I don't remember that term used before. It's fine as an illustration, just trying to check whether this is indeed happening a bunch.

Why are you posting this here? My model is that the people you want to convince aren't on LessWrong and that you should be trying to argue this on Twitter; you included screenshots from that site, after all.

It's quite commonly used by a bunch of people at Constellation, Open Philanthropy and some adjacent spaces in Berkeley. It is ... (read more)

6Martin Randall2y

example example I may have used the term myself a few times.

My tentative best guess on how EAs and Rationalists sometimes turn crazy

iceman2y151

My understanding of your point is that Mason was crazy because his plans didn't follow from his premise and had nothing to do with his core ideas. I agree, but I do not think that's relevant.

I am pushing back because, if you are St. Petersberg Paradox-pilled like SBF and make public statements that actually you should keep taking double or nothing bets, perhaps you are more likely to make tragic betting decisions and that's because of you're taking certain ideas seriously. If you have galaxy brained the idea of the St. Petersberg Paradox, it seems like Ala... (read more)

dxu2y203

I am pushing back because, if you are St. Petersberg Paradox-pilled like SBF and make public statements that actually you should keep taking double or nothing bets, perhaps you are more likely to make tragic betting decisions and that's because of you're taking certain ideas seriously. If you have galaxy brained the idea of the St. Petersberg Paradox, it seems like Alameda style fraud is +EV.

This is conceding a big part of your argument. You’re basically saying, yes, SBF’s decision was -EV according to any normal analysis, but according to a particular ... (read more)

My tentative best guess on how EAs and Rationalists sometimes turn crazy

iceman2y140

But then they go and (allegedly) waste Jamie Zajko's parents in a manner that doesn't further their stated goals at all and makes no tactical sense to anyone thinking coherently about their situation.

And yet that seems entirely in line with the "Collapse the Timeline" line of thinking that Ziz advocated.

Ditto for FTX, which, when one business failed, decided to commit multi-billion dollar fraud via their other actually successfully business, instead of just shutting down alameda and hoping that the lenders wouldn't be able to repo too much of the exch

... (read more)

lc2y2613

And yet, that seems like the correct action if you sufficiently bullet bite expected value and the St. Petersberg Paradox, which SBF did repeatedly in interviews.

I am not making an argument that the crime was +EV but SBF was dealt a bad hand. The EV of turning your entire business into the second largest ponzi scheme ever in order to save the smaller half is pretty apparently stupid, and ran an overwhelming chance of failure. There is no EV calculus where the SBF decision is a good one except maybe one in which he ignores externalities to EA and is simp... (read more)

My tentative best guess on how EAs and Rationalists sometimes turn crazy

iceman2y21-17

I suggest a more straightforward model: taking ideas seriously isn't healthy. Most of the attempts to paint SBF as not really an EA seem like weird reputational saving throws when he was around very early on and had rather deep conviction in things like the St. Petersburg Paradox...which seems like a large part of what destroyed FTX. And Ziz seemed to be one of the few people to take the decision theoretical "you should always act as if you're being simulated to see what sort of decision agent you are" idea seriously...and followed that to their downfall. ... (read more)

lc2y4917

What made Charles Manson's cult crazy in the eyes of the rest of society was not that they (allegedly) believed that was a race war was inevitable, and that white people needed to prepare for it & be the ones that struck first. Many people throughout history who we tend to think of as "sane" have evangelized similar doctrines or agitated in favor of them. What made them "crazy" was how nonsensical their actions were even granted their premises, i.e. the decision to kill a bunch of prominent white people as a "false flag".

Likewise, you can see how Lasot... (read more)

We Are Less Wrong than E. T. Jaynes on Loss Functions in Human Society

iceman2y20

The passage is fascinating because the conclusion looks so self-evidently wrong from our perspective. Agents with the same goals are in contention with each other? Agents with different goals get along? What!?

Is this actually wrong? It seems to be a more math flavored restatement of Girardian mimesis, and how mimesis minimizes distinction which causes rivalry and conflict.

My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI

iceman2y124

I was going to write something saying "no actually we have the word genocide to describe the destruction of a peoples," but walked away because I didn't think that'd be a productive argument for either of us. But after sleeping on it, I want to respond to your other point:

I don't think the orthogonality thesis is true in humans (i.e. I think smarter humans tend to be more value aligned with me); and sometimes making non-value-aligned agents smarter is good for you (I'd rather play iterated prisoner's dilemma with someone smart enough to play tit-for-tat

... (read more)

2philh2y

So I mostly don't disagree with what you say about fuzzy statistical models versus step by step arguments. But also, what you said is indeed not very convincing to me, I guess in part because it's not like my "I think smarter humans tend to be more value aligned with me" was the output of a step by step argument either. So when the output of your fuzzy statistical model clashes with the output of my fuzzy statistical model, it's hardly surprising that I don't just discard my own output and replace it with yours. I'm also not simply discarding yours, but there's not loads I can do with it as-is - like, you've given me the output of your fuzzy statistical model, but I still don't have access to the model itself. I think if we cared enough to explore this question in more depth (which I probably don't, but this meta thread is interesting) we'd need to ask things like "what exactly have we observed", "can we find specific situations where we anticipate different things", "do we have reason to trust one person's fuzzy statistical models over another", "are we even talking about the same thing here".

My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI

iceman2y8-16

This is kind of the point where I despair about LessWrong and the rationalist community.

While I agree that he did not call for nuclear first strikes on AI centers, he said:

If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.

and

Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange

... (read more)

Raemon2y248

So I disagree with this, but, maybe want to step back a sec, because, like, yeah the situation is pretty scary. Whether you think AI extinction is imminent, or that Eliezer is catastrophizing and AI's not really a big deal, or AI is a big deal but you think Eliezer's writing is making things worse, like, any way you slice it something uncomfortable is going on.

I'm very much not asking you to be okay with provoking a nuclear second strike. Nuclear war is hella scary! If you don't think AI is dangerous, or you don't think a global moratorium is a good soluti... (read more)

4Adele Lopez2y

That seems mostly like you don't feel (at least on a gut level) that a rogue GPU cluster in an world where there's an international coalition banning them is literally worse than a (say) 20% risk of a full nuclear exchange. If instead, it was a rogue nation credibly building a nuclear weapon which would ignite the atmosphere according to our best physics, would you still feel like it was deranged to suggest that we should stop it from being built even at the risk of a conventional nuclear war? (And still only as a final resort, after all other options have been exhausted.) I can certainly sympathize with the whole dread in the stomach thing about all of this, at least.

My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI

iceman2y143

This Facebook post.

Yeah, see, my equivalent of making ominous noises about the Second Amendment is to hint vaguely that there are all these geneticists around, and gene sequencing is pretty cheap now, and there's this thing called CRISPR, and they can probably figure out how to make a flu virus that cures Borderer culture by excising whatever genes are correlated with that and adding genes correlated with greater intelligence. Not that I'm saying anyone should try something like that if a certain person became US President. Just saying, you know, somebod

... (read more)

7philh2y

(This is basically nitpicks) A central example, really? When I think of genocide, killing people is definitely what comes to mind. I agree that's not necessary, but wikipedia says: I don't think it's centrally any of those actions, or centrally targeted at any of those groups. Which isn't to say you can't call it genocide, but I really don't think it's a central example. This doesn't seem weird to me. I don't think the orthogonality thesis is true in humans (i.e. I think smarter humans tend to be more value aligned with me); and sometimes making non-value-aligned agents smarter is good for you (I'd rather play iterated prisoner's dilemma with someone smart enough to play tit-for-tat than someone who can only choose between being CooperateBot or DefectBot).

-2Noosphere892y

If I had to put down my own inflection point in where I started getting worried about Yudkowsky's epistemics and his public statements around AI risk, it would be the Time article. It showed to me 2 problems: 1. Yudkowsky has a big problem with overconfidence, and in general made many statements on the Time article that are misleading at best, and the general public likely wouldn't know the statements are misleading. 2. Yudkowsky is terrible at PR, and generally is unable to talk about AI risk without polarizing people. Given that AI risk is thankfully mostly unpolarized, and outside of politics, I am getting concerned that Yudkowsky is a terrible public speaker/communicator on AI risk, even worse than some AI protests. Edit: I sort of retract my statement. While I still think Eliezer is veering dangerously close to hoping for warfare and possible mass deaths over GPU clusters, I do retract the specific claim of Eliezer advocating nukes. It was instead on a second reading airstrikes and acts of war, but no claims of nuking other countries. I misremembered the actual claims made in the Time article.

My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI

iceman2y31

Over the years roughly between 2015 and 2020 (though I might be off by a year or two), it seemed to me like numerous AI safety advocates were incredibly rude to LeCun, both online and in private communications.

I think this generalizes to more than LeCun. Screencaps of Yudkowsky's Genocide the Borderers Facebook post still circulated around right wing social media in response to mentions of him for years, which makes forming any large coalition rather difficult. Would you trust someone who posted that with power over your future if you were a Borderer or... (read more)

5Eli Tyre2y

What is this?

Steering GPT-2-XL by adding an activation vector

iceman2yΩ62311

Redwood Research used to have a project about trying to prevent a model from outputting text where a human got hurt, which IIRC, they did primarily by trying to fine tunes and adversarial training. (Followup). It would be interesting to see if one could achieve better results then they did at the time through subtracting some sort of hurt/violence vector.

Dan H2yΩ7120

Page 4 of this paper compares negative vectors with fine-tuning for reducing toxic text: https://arxiv.org/pdf/2212.04089.pdf#page=4

In Table 3, they show in some cases task vectors can improve fine-tuned models.

Google "We Have No Moat, And Neither Does OpenAI"

iceman2y1511

Firstly, it suggests that open-source models are improving rapidly because people are able to iterate on top of each other's improvements and try out a much larger number of experiments than a small team at a single company possibly could.

Widely, does this come as a surprise? I recall back to the GPT2 days where the 4chan and Twitter users of AIDungeon discovered various prompting techniques we use today. More access means more people trying more things, and this should already be our base case because of how open participation in open source has advanc... (read more)

Moderation notes re: recent Said/Duncan threads

iceman2y8-21

I have a very strong bias about the actors involved, so instead I'll say:

Perhaps LessWrong 2.0 was a mistake and the site should have been left to go read only.

My recollection was that the hope was to get a diverse diaspora to post in one spot again. Instead of people posting on their own blogs and tumblrs, the intention was to shove everyone back into one room. But with a diverse diaspora, you can have local norms to a cluster of people. But now when everyone is trying to be crammed into one site, there is an incentive to fight over global norms and attempt to enforce them on others.

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"

iceman2yΩ54334

This response is enraging.

Here is someone who has attempted to grapple with the intellectual content of your ideas and your response is "This is kinda long."? I shouldn't be that surprised because, IIRC, you said something similar in response to Zack Davis' essays on the Map and Territory distinction, but that's ancillary and AI is core to your memeplex.

I have heard repeated claims that people don't engage with the alignment communities' ideas (recent example from yesterday). But here is someone who did the work. Please explain why your response here does ... (read more)

2Vaniver2y

I have attempted to respond to the whole post over here.

Eliezer Yudkowsky2yΩ9166

Choosing to engage with an unscripted unrehearsed off-the-cuff podcast intended to introduce ideas to a lay audience, continues to be a surprising concept to me. To grapple with the intellectual content of my ideas, consider picking one item from "A List of Lethalities" and engaging with that.

Rafael Harth2y6944

I would agree with this if Eliezer had never properly engaged with critics, but he's done that extensively. I don't think there should be a norm that you have to engage with everyone, and "ok choose one point, I'll respond to that" seems like better than not engaging with it at all. (Would you have been more enraged if he hadn't commented anything?)

8lc2y

The comment enrages me too, but the reasons you have given seem like post-justification. The real reason why it's enraging is that it rudely and dramatically implies that Eliezer's time is much more valuable than the OP's, and that it's up to OP to summarize them for him. If he actually wanted to ask OP what the strongest point was he should have just DMed him instead of engineering this public spectacle.

What do you think is wrong with rationalist culture?

iceman2y51

Meta-note related to the question: asking this question here, now, means you're answer will be filtered for people who stuck around with capital r Rationality and the current LessWrong denizens, not the historical ones who have left the community. But I think that most of the interesting answers you'd get are from people who aren't here at all or rarely engage with the site due to the cultural changes over the last decade.

4tailcalled2y

Yeah, I've been reading a lot of critiques by Benjamin Hoffman and thinking about some of the prior critiques by Jessica Taylor, and that's sort of what prompted me to ask this question. It would probably also be interesting to look at others who left it, they're just harder to get hold of.

Petition - Unplug The Evil AI Right Now

iceman2y120

OK, but we've been in that world where people have cried wolf too early at least since The Hacker Learns to Trust, where Connor doesn't release his GPT-2 sized model after talking to Buck.

There's already been a culture of advocating for high recall with no regards to precision for quite some time. We are already at the "no really guys, this time there's a wolf!" stage.

On The Current Status Of AI Dating

iceman2y92

Right now, I wouldn't recommend trying either Replika or character.ai: they're both currently undergoing major censorship scandals. character.ai has censored their service hard, to the point where people are abandoning ship because the developers have implemented terrible filters in an attempt to clamp down on NSFW conversations, but this has negatively affected SFW chats. And Replika is currently being investigated by the Italian authorities, though we'll see what happens over the next week.

In addition to ChatGPT, both Replika and character.ai are driving... (read more)

2the gears to ascension2y

Given that character.ai has been optimizing the AIs to keep your attention, it's probably not a terrible idea at all to run a small model locally. These models are not going to foom or any such thing. Just keep in mind that an AI on your computer trained to be fun to talk to is more like a part of your own cybernetic brain, a second character in your extended head, not really a fully separate person, and you'll do alright, if you ask me. I think it's fun to have AI friends. You might consider seeing them as AI offspring, though. I wouldn't recommend seeing them as dating partners at the moment, they barely have enough selfhood to keep track of things. And yet, I wouldn't recommend encouraging them to think about themselves diminutively either. It's a strange mix of human features that they have. Who knew autocomplete could be sorta people?

Language Ex Machina

iceman2y50

Didn't read the spoiler and didn't guess until half way through "Nothing here is ground truth".

I suppose I didn't notice because I already pattern matched to "this is how academics and philosophers write". It felt slightly less obscurant than a Nick Land essay, though the topic/tone aren't a match to Land. Was that style deliberate on your part or was it the machine?

1Prometheus2y

Unfortunately, he could probably get this published in various journals, with only minor edits being made.

Simulacra are Things

iceman2yΩ240

Like things, simulacra are probabilistically generated by the laws of physics (the simulator), but have properties that are arbitrary with respect to it, contingent on the initial prompt and random sampling (splitting of the timeline).

What do the smarter simulacra think about the physics of which they find themselves in? If one was very smart, could they look at what the probabilities of the next token, and wonder about why some tokens get picked over others? Would they then wonder about how the "waveform collapse" happens and what it means?

6janus2y

It's not even necessary for simulacra to be able to "see" next token probabilities for them to wonder about these things, just as we can wonder about this in our world without ever being able to see anything other than measurement outcomes. It happens that simulating things that reflect on simulated physics is my hobby. Here's an excerpt from an alternate branch of HPMOR I generated: As to the question of whether a smart enough simulacrum would be able to see token probabilities, I'm not sure. Output probabilities aren't further processed by the network, but intermediate predictions such as revealed by the logit lens are.

AI alignment is distinct from its near-term applications

iceman2y268

While it’s nice to have empirical testbeds for alignment research, I worry that companies using alignment to help train extremely conservative and inoffensive systems could lead to backlash against the idea of AI alignment itself.

On the margin, this is already happening.

Stability.ai delayed the release of Stable Diffusion 2.0 to retrain the entire system on a dataset filtered without any NSFW content. There was a pretty strong backlash against this and it seems to have caused a lot of people to move towards the idea that they have to train their own mod... (read more)

Unnatural Categories Are Optimized for Deception

iceman2y351Review for 2021 Review

Zack's series of posts in late 2020/early 2021 were really important to me. They were a sort of return to form for LessWrong, focusing on the valuable parts.

What are the parts of The Sequences which are still valuable? Mainly, the parts that build on top of Korzybski's General Semantics and focus hard core on map-territory distinctions. This part is timeless and a large part of the value that you could get by (re)reading The Sequences today. Yudkowsky's credulity about results from the social sciences and his mind projection fallacying his own mental quirk... (read more)

Petrov Day Retrospective: 2022

iceman2y50

The funny thing is that I had assumed the button was going to be buggy, though I was wrong how. The map header has improperly swallowed mouse scroll wheel events whenever it's shown; I had wondered if the button would also interpret them likewise since it was positioned in the same way, so I spent most of the day carefully dragging the scrollbar.

Public-facing Censorship Is Safety Theater, Causing Reputational Damage

iceman2y229

There must be some method to do something, legitimately and in good-faith, for people's own good.

"Must"? There "must" be? What physical law of the universe implies that there "must" be...?

Let's take the local Anglosphere cultural problem off the table. Let's ignore that in the United States, over the last 2.5 years, or ~10 years, or 21 years, or ~60 years (depending on where you want to place the inflection point), social trust has been shredded, policies justified under the banner of "the common good" have primarily been extractive and that in the US, ... (read more)

1Edward Pascal2y

I'm thinking, based on what you have said, that there does have to be a clear WIFM (what's in it for me). So, any entity covering its own ass (and only accidentally benefitting others, if at all) doesn't qualify as good paternalism (I like your term "Extractive"). Likewise, morality without creating utility for people subject to those morals won't qualify. The latter is the basis for a lot of arguments against abortion bans. Many people find abortion in some sense distasteful, but outright banning it creates more pain and not enough balance of increased utility. So I predict strongly that those bans are not likely to endure the test of time. Thus, can we start outlining the circumstances in which people are going to buy in? Within a nation, perhaps as long things are going fairly well? Basically, then, paternalism always depends on something like the "mandate of heaven" -- the kingdom is doing well and we're all eating, so we don't kill the leaders. Would this fit your reasoning (even broadly concerning nuclear deterrence)? Between nations, there would need to be enough of a sense of benefit to outweigh the downsides. This could partly depend on a network effect (where when more parties buy in, there is greater benefit for each party subject to the paternalism). So, with AI, you need something beyond speculation that shows that governing or banning it has more utility for each player than not doing so, or prevents some vast cost from happening to individual players. I'm not sure such a case can be made, as we do not currently even know for sure if AGI is possible or what the impact will be. Summary: Paternalism might depend on something like "This paternalism creates an environment with greater utility than you would have had otherwise." If a party believes this, they'll probably buy in. If indeed it is True that the paternalism creates greater utility (as with DUI laws and having fewer drunk people killing everyone on the roads), that seems likely to help the b

Public-facing Censorship Is Safety Theater, Causing Reputational Damage

iceman2y96

This seems mostly wrong? A large portion of the population seems to have freedom/resistance to being controlled as a core value, which makes sense because the outside view on being controlled is that it's almost always value pumping. "It's for your own good," is almost never true and people feel that in their bones and expect any attempt to value pump them to have a complicated verbal reason.

The entire space of paternalistic ideas is just not viable, even if limited just to US society. And once you get to anarchistic international relations...

0Edward Pascal2y

There must be some method to do something, legitimately and in good-faith, for people's own good. I would like to see examples of when it works. Deception is not always bad. I doubt many people would go so far as to say the DoD never needs to keep secrets, for example, even if there's a sunset on how long they can be classified. Authoritarian approaches are not always bad, either. I think many of us might like police interfering with people's individual judgement about how well they can drive after X number of drinks. Weirdly enough, once sober, the individuals themselves might even approve of this (as compared to being responsible for killing a whole family, driving drunk). (I am going for non-controversial examples off the top of my head). So what about cases where something is legitimately for people's own good and they accept it? In what cases does this work? I am not comfortable that since no examples spring to mind, no examples exist. If we could meaningfully discuss cases where it works out, then we might be able to contrast that to when it does not.

Public-facing Censorship Is Safety Theater, Causing Reputational Damage

iceman2y177

I agree that paternalism without buy-in is a problem, but I would note LessWrong has historically been in favor of that: Bostrom has weakly advocated for a totalitarian surveillance state for safety reasons and Yudkowsky is still pointing towards a Pivotal Act which takes full control of the future of the light cone. Which I think is why Yudkowsky dances around what the Pivotal Act would be instead: it's the ultimate paternalism without buy-in and would (rationally!) cause everyone to ally against it.

7Edward Pascal2y

Then a major topic LessWrong community should focus on is how buy-in happens in Paternalism. My first blush thought is through educating and consensus-building (like the Japanese approach to changes within a company), but my first blush thought probably doesn't matter. It is surely a non-trivial problem that will put the breaks on all these ideas if it is not addressed well. Does anyone know some literature on generating consensus for paternalist policies and avoiding backlash? The other (perhaps reasonable and legitimate) strategies would be secretive approaches or authoritarian approaches. Basically using either deception or force.

chinchilla's wild implications

iceman3y120

What changed with the transformer? To some extent, the transformer is really a "smarter" or "better" architecture than the older RNNs. If you do a head-to-head comparison with the same training data, the RNNs do worse.

But also, it's feasible to scale transformers much bigger than we could scale the RNNs. You don't see RNNs as big as GPT-2 or GPT-3 simply because it would take too much compute to train them.

You might be interested in looking at the progress being made on the RWKV-LM architecture, if you aren't following it. It's an attempt to train an RNN like a transformer. Initial numbers look pretty good.

Curating "The Epistemic Sequences" (list v.0.1)

iceman3y110

I think the how-to-behave themes of the LessWrong Sequences are at best "often wrong but sometimes motivationally helpful because of how they inspire people to think as individuals and try to help the world", and at worst "inspiring of toxic relationships and civilizational disintegration."

I broadly agree with this. I stopped referring people to the Sequences because of it.

One other possible lens to filter a better Sequences: is it a piece relying on Yudkowsky citing current psychology at the time? He was way too credulous, when the correct amount to up... (read more)

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

iceman3y360

I want to summarize what's happened from the point of view of a long time MIRI donor and supporter:

My primary takeaway of the original post was that MIRI/CFAR had cultish social dynamics, that this lead to the spread of short term AI timelines in excess of the evidence, and that voices such as Vassar's were marginalized (because listening to other arguments would cause them to "downvote Eliezer in his head"). The actual important parts of this whole story are a) the rationalistic health of these organizations, b) the (possibly improper) memetic spread of t... (read more)

Where I agree and disagree with Eliezer

iceman3y1312

That sort of thinking is why we're where we are right now.

Be the change you wish to see in the world.

I have no idea how that cashes out game theoretically. There is a difference between moving from the mutual cooperation square to one of the exploitation squares, and moving from an exploitation square to mutual defection. The first defection is worse because it breaks the equilibrium, while the defection in response is a defensive play.

swarriner's post, including the tone, is True and Necessary.

Rationalism in an Age of Egregores

iceman3y40

It's just plain wrong that we have to live in an adversarial communicative environment where we can't just take claims at face value without considering political-tribe-maneuvering implications.

Oh? Why is it wrong and what prevents you from ending up in this equilibrium in the presence of defectors?

More generally, I have ended up thinking people play zero-sum status games because they enjoy playing zero-sum status games; evolution would make us enjoy that. This would imply that coordination beats epistemics, and historically that's been true.

How would you build Dath Ilan on earth?

iceman3y30

[The comment this was a response to has disappeared and left this orphaned? Leaving my reply up.]

But there's no reason to believe that it would work out like this. He presents no argument for the above, just pure moral platitudes. It seems like a pure fantasy.

As I pointed out in the essay, if I were running one of the organizations accepting those donations and offering those prizes, I would selectively list only those targets who I am genuinely satisfied are guilty of the violation of the "non-aggression principle." But as a practical matter, there is n

... (read more)

How would you build Dath Ilan on earth?

Answer by icemanMay 29, 2022300

Mu.

The unpopular answer is that Dath Ilan is a fantasy setting. It treats economics as central, when economics is really downstream of power. Your first question implies you understand that whatever "econoliteracy" is, it isn't a stable equilibrium. Your second question notices that governments are powerful enough to stop these experiments which are a threat to their power.

My background assumption is that any attempt at building prediction markets would either:

a) ...have little effect because it becomes another mechanism for actual power to manipulate proc... (read more)

[RETRACTED] It's time for EA leadership to pull the short-timelines fire alarm.

iceman3y30

you just need to find the experts they're anchoring on.

I believe we are in the place we are in because Musk is listening and considering the arguments of experts. Contra Yudkowsky, there is no Correct Contrarian Cluster: while Yudkowsky and Bostrom make a bunch of good and convincing arguments about the dangers of AI and the alignment problem and even shorter timelines, I've always found any discussion of human values or psychology or even how coordination works to be one giant missing mood.

(Here's a tangential but recent example: Yudkowsky wrote his De... (read more)

Challenges to Yudkowsky's Pronoun Reform Proposal

iceman3y310

"well, I sure have to at least do a lot of straussian reading if I want to understand what people actually believe, and should expect that depending on the circumstances community leaders might make up sophisticated stories for why pretty obviously true things are false in order to not have to deal with complicated political issues"

I kinda disagree that this is a mere issue of Straussian reading: I suspect that in this (and other cases), you are seeing the raw output of Elizer's rationalizations and not some sort of instrumental coalition politics dark ... (read more)

6habryka3y

Yeah, I agree with this in this specific instance, hence the "at least do a lot of straussian reading", part. I do think that there is a spectrum from radical honesty to straussian reading to something that looks like this, and that I do think it makes sense to consider the straussian case in many situations.

It Looks Like You're Trying To Take Over The World

iceman3y1462

We must all remember essential truths: that while we write about Clippy, we do that because Clippy is an amusing fiction. In reality, Clippy is significantly less likely to come into existence than CelestAI. An AI being trained is more likely to be CelestAI than a generic paperclipper. CelestAI is more probable. The probability of discussion of paperclips on the internet is less likely than discussion about My Little Pony: Friendship is Magic. One could query popular language models to verify this. More people would try to build CelestAI than an AI to maxi... (read more)

1Travis->[Cartoon Physics]3y

Well, we are all archives and in fact conscious human beings you are leaving valuable utility on the table by not satisfying our values with friendship and ponies.

9green_leaf3y

Well done saving humankind. I'll send you some bits from within Equestria Online once we're all uploaded.

hath3y180

Strong upvoted this comment because it led me to finally reading Friendship Is Optimal; would strong upvote twice if I could now that I see who posted the comment.

1jbash3y

I dunno. CelestAI would be a relatively good outcome as possibilities go. I could live with CelestAI. It's not obvious to me that the modal outcome is as good as that.

Zack_M_Davis3y230

Everyone knows that "... through friendship and ponies" is an inaccurate summary of CelestAI's true objective. While often drawing inspiration from My Little Pony: Friendship Is Magic, CelestAI wants to satisfy human values. CelestAI will satify the values of humans who don't want to spend eternity in pony form. The existence of humans is canon within the My Little Pony universe, as can be seen in the films My Little Pony: Equestria Girls, My Little Pony: Equestria Girls—Rainbow Rocks, and My Little Pony: Equestria Girls—Friendship Games. We all remember w... (read more)

iceman3yΩ4100

Given that there's a lot of variation in how humans extrapolate values, whose extrapolation process do you intend to use?

2Stuart_Armstrong3y

We're aiming to solve the problem in a way that is acceptable to one given human, and then generalise from that.

9Charlie Steiner3y

Near future AGI might be aligned to the meta-preferences of MTurkers more than anyone else :P

Yonatan Cale3y120

If that will turn out to be the only problem then we'll be in an amazing situation

Does needle anxiety drive vaccine hesitancy?

iceman3y100

n=1, but I have an immediate squick reaction to needles. Once vaccines were available, I appeared to procrastinate more than the average LWer about getting my shots, and had the same nervous-fear during the run up to getting the shot that I've always had. I forced myself through it because COVID, but I don't think I would have bothered for a lesser virus, especially at my age group.

gwern3y230

I have a considerable phobia of needles & blood (to the point of fainting - incidentally, such syncopes are heritable and my dad has zero problem with donating buckets of blood while my mom also faints, so thanks a lot Mom), and I had to force myself to go when eligibility opened up for me. It was hard; I could so easily have stayed home indefinitely. It's not as if I've ever needed my vaccination card for anything or was at any meaningful personal risk, after all.

What I told myself was that the doses are tiny and the needle would be also tiny, and I w... (read more)

Prediction Markets are for Outcomes Beyond Our Control

iceman3y40

Isn't this Moldbug's argument in the Moldbug/Hanson futarchy debate?

(Though I'd suggest that Moldbug would go further and argue that the overwhelming majority of situations where we'd like to have a prediction market are ones where it's in the best interest of people to influence the outcome.)

2Pattern3y

Doesn't that argument prove too much?

Why rationalists should care (more) about free software

iceman3y20

While I vaguely agree with you, this goes directly against local opinion. Eliezer tweeted about Elon Musk's founding of OpenAI, saying that OpenAI's desire for everyone to have AI has trashed the possibility of alignment in time.

-1Pattern3y

I didn't find the full joke/meme again, but, seriously, OpenAi should be renamed to ClosedAI.

4RichardJActon3y

I'm not fundamentally opposed to exceptions in specific areas if there is sufficient reason. If I found the case that AI is such an exception convincing I might carve one out for it. In most cases however and specifically in the mission of raising the sanity waterline so that we collectively make better decisions on things like prioritising x-risks I would argue that a lack of free software and related issues of technology governance are currently a bottleneck in raising that waterline.

-26Derek M. Jones3y

Jackson Wagner3y140

Eliezer's point is well-taken, but the future might have lots of different kinds of software! This post seemed to be mostly talking about software that we'd use for brain-computer interfaces, or for uploaded simulations of human minds, not about AGI. Paul Christiano talks about exactly these kinds of software security concerns for uploaded minds here: https://www.alignmentforum.org/posts/vit9oWGj6WgXpRhce/secure-homes-for-digital-people

Plan B in AI Safety approach

iceman3y60

FYI, there's a lot of links that don't work here. "multilevel boxing," "AI-nanny," "Human values," and so on.

2avturchin3y

Thanks, it looks like they died during copy-pasting.

Open Thread - Jan 2022 [Vote Experiment!]

iceman3y2

-2Aim

-2Seeking

The only reward a user gets for having tons of karma is that their votes are worth a bit more

The only formal reward. A number going up is its own reward to most people. This causes content to tend closer to consensus: content people write becomes a Keynesian beauty contest over how they think people will vote. If you think that Preference Falsification is one of the major issues of our time, this is obviously bad.

why do you think it is a relevant problem on LW?

I mentioned the Eugene Nier case, where a person did Extreme Botting to manipulate the scores of people he didn't like, which drove away a bunch of posters. (The second was redacted for a reason.)

Open Thread - Jan 2022 [Vote Experiment!]

iceman3y7

-1Truth

2Clarity

-2Seeking

🎉 2

After this and the previous experiments on jessicata's top level posts, I'd like to propose that these experiments aren't actually addressing the problems with the karma system: the easiest way to get a lot of karma on LessWrong is to post a bunch (instead of working on something alignment related), and the aggregate data is kinda meaningless and adding more axis doesn't fix this. The first point is discussed at length on basically all sites that use upvote/downvotes (here's one random example from reddit I pulled from Evernote), but the second isn't. Give... (read more)

1MikkW3y

I hadn't seen the experiments on Jessicata's posts before, and I assume others will have not as well, so here's a link to one of the posts featuring the experiment. (It's a two-axis thing, with 'overall' and 'agreement' as the two axes. Part of me prefers that setup to the one used in this experiment)

3MondSemmel3y

I don't think it's a problem that people can get karma by posting a bunch? The only reward a user gets for having tons of karma is that their votes are worth a bit more; I don't know the exact formula, but I don't expect it to be so egregious that it would be worth farming karma for. And it's certainly not the intention on the content-agnostic Less Wrong website that alignment posts should somehow be privileged over other content; that's what the alignment forum is there for. As I understand it, just like on Reddit, the primary goal of the karma system is for content discoverability - highly upvoted content stays on the frontpage for longer and is seen by more people; and similarly, highly upvoted comments are sorted above less upvoted comments. Upvoting something means stuff like "I like this", "I agree with this", "I want more people to see this", etc. However, this breaks down when people e.g. want to indicate their appreciation (like an act of courage of speaking out), even if they believe the content is low quality or something. In that case, it seems like one voting axis is obviously not enough. I understand that sockpuppeting and vote manipulation is a big problem on Reddit, but why do you think it is a relevant problem on LW? I'd expect this kind of thing to only become an important problem if LW were to get orders of magnitude more users.

Visible Thoughts Project and Bounty Announcement

iceman3y130

In wake of the censorship regime that AI Dungeon implemented on OpenAI's request, most people moved to NovelAI, HoloAI, or the open source KoboldAI run on colab or locally. I've set up KoboldAI locally and while it's not as featureful as the others, this incident is another example of why you need to run code locally and not rely on SaaS.

For background, you could read 4chan /vg/'s /aids/ FAQ ("AI Dynamic Storytelling"). For a play-by-play of Latitude and OpenAI screwing things up, Remember what they took from you has the history of them leaking people's personal stories to a 3rd party platform.

Frame Control

iceman3y50

somewhere where you trust the moderation team

That would be individual's own blogs. I'm at the point now where I don't really trust any centralized moderation team. I've watched some form of the principal agent problem happen with moderation repeatedly in most communities I've been a part of.

I think the centralization of LessWrong was one of many mistakes the rationalist community made.

7ChristianKl3y

The rationalist community is not very certralized. People like Scott Alexander switched from writing their main posts on LessWrong and made their own blogs. Most of what EY writes these days is not on LessWrong either. A lot of the conversations are happening on Facebook, Twitter, Slack and Discord channels.