Matthew_Opitz

It would be more impressive if Claude 3 could describe genuinely novel experiences. For example, if it is somewhat conscious, perhaps it could explain how that consciousness meshes with the fact that, so far as we know, its "thinking" only runs at inference time in response to user requests. In other words, LLMs don't get to do their own self-talk (so far as we know) whenever they aren't being actively queried by a user. So, is Claude 3 at all conscious in those idle times between user queries? Or does Claude 3 experience "time" in a way that jumps straight from conversation to conversation? Also, since LLMs currently don't get to consult... (read more)

Replying toValue systematization: how values become coherent (and misaligned)

Matthew_Opitz2y

Value systematization: how values become coherent (and misaligned)

Except that chess really does have an objectively correct value systemization, which is "win the game." "Sitting with paradox" just means, don't get too attached to partial systemizations. It reminds me of Max Stirner's egoist philosophy, which emphasized that individuals should not get hung up on partial abstractions or "idées fixées" (honesty, pleasure, success, money, truth, etc.) except perhaps as cheap, heuristic proxies for one's uber-systematized value of self-interest, but one should instead always keep in mind the overriding abstraction of self-interest and check in periodically as to whether one's commitment to honesty, pleasure, success, money, truth, or any of these other "spooks" really are promoting one's self-interest (perhaps yes, perhaps no).

-2

Replying toJune and Mulberries

Matthew_Opitz3y

June and Mulberries

I agree, I don't know why mulberries aren't more popular. They are delicious, and the trees grow much more easily than other fruit trees. Other fruit trees seem very susceptible to fungi and insects, in my experience, but mulberries come up all over the place and thrive easily on their own (at least here in Missouri). I have four mulberry trees in my yard that just came up on their own over the last 10 years, and now they are producing multiple gallons of berries each per season, which would probably translate into hundreds of dollars if you had to buy a similar amount of raspberries at the store.

You can either... (read more)

Proxi-Antipodes: A Geometrical Intuition For The Difficulty Of Aligning AI With Multitudinous Human Values

Matthew_Opitz

Just as intelligence is not reversed-stupidity, human values are not reversed-monstrosities. As far as visions of the future go, a reversed-monstrosity is likely to be merely (to us) a different flavor of monstrosity. Even if we could locate an "antipodal" future, one exactly contrary to every human value, reversing that vision would be unlikely to yield an appealing future to us insofar as that antipodal monstrosity composed of n different world-state variables was reversed by any number of world-state variables less than n.

For example, imagine a list of human preferences where n = 10 (a very small number compared to the number of human preferences in reality; see shard theory). Imagine... (read 1358 more words →)

Replying tohow humans are aligned

Matthew_Opitz3y

how humans are aligned

Good categorizations! Perhaps this fits in with your "limited self-modification" point, but another big reason why humans seem "aligned" with each other is that our capability spectrum is rather narrow. The gap in capability (if we include both mental intelligence and physical capabilities) between the median human and the most capable human is not so big that ~5 median humans can't outmatch/outperform the most capable human. Contrary to what silly 1980s action movies might suggest where goons attack the hero one at a time, 5 median humans could probably subdue prime-age Arnold Schwarzenegger in a dark alley if need be. This tends to force humans to play iterated prisoners' dilemma games with... (read more)

Replying toUn-unpluggability - can't we just unplug it?

Matthew_Opitz3y

Un-unpluggability - can't we just unplug it?

If I had to make predictions about how humanity will most likely stumble into AGI takeover, it would be a story where humanity first promotes foundationality (dependence), both economic and emotional, on discrete narrow-AI systems. At some point, it will become unthinkable to pull the plug on these systems even if everyone were to rhetorically agree that there was a 1% chance of these systems being leveraged towards the extinction of humanity.

Then, an AGI will emerge amidst one of these narrow-AI systems (such as LLMs), inherit this infrastructure, find a way to tie all of these discrete multi-modal systems together (if humans don't already do it for the AGI), and possibly... (read more)

Replying toDark Forest Theories

Matthew_Opitz3y

Dark Forest Theories

This is a good post and puts into words the reasons for some vague worries I had about an idea of trying to start an "AI Risk Club" at my local college, which I talk about here. Perhaps that method of public outreach on this issue would just end up generating more heat than light and would attract the wrong kind of attention at the current moment. It still sounds too outlandishly sci-fi for most people. It is probably better, for the time being, to just explore AI risk issues with any students who happen to be interested in it in private after class or via e-mail or Zoom.

Replying toDELBERTing as an Adversarial Strategy

Matthew_Opitz3y

DELBERTing as an Adversarial Strategy

Note that I was strongly tempted to use the acronym DILBERT (for "Do It Later By Evasively Remaining Tentative"), especially because this is one of the themes of the Dilbert cartoons (employees basically scamming their boss by finding excuses for procrastinating, but still stringing the boss along and implying that the tasks MIGHT get done at some point). But, I don't want to try to hijack the meaning of an already-established term/character.

DELBERTing as an Adversarial Strategy

Matthew_Opitz

There is an entire subgenre of "reverse scammer" videos on youtube. The way these videos work is, a youtuber waits for a scam call from Pakistan or somewhere. When it becomes obvious that it's a scam, like when the scammer starts asking for a credit card number in order to do some complicated thing with a free gift card that you've supposedly won, instead of just hanging up on the scammer, the reverse scammer starts to string them along. The reverse scammer might send numbers that don't work, or feign ignorance, or ask for the scammer's help in return to do something, all the while continuing to feign interest and naiveté about... (read 1317 more words →)

Replying toThe way AGI wins could look very stupid

Matthew_Opitz3y

The way AGI wins could look very stupid

I think when we say that an adversarial attack is "dumb" or "stupid" what we are really implying is that the hack itself is really clever but it is exploiting a feature that is dumb or stupid. There are probably a lot of unknown-to-us features of the human brain that have been hacked together by evolution in some dumb, kludgy way that AI will be able to take advantage of, so your example above is actually an example of the AI being brilliant but us humans being dumb. But I get what you are saying that that whole situation would indeed seem "dumb" if AI was able to hack us like that.... (read more)

The Academic Field Pyramid - any point to encouraging broad but shallow AI risk engagement?

Matthew_Opitz

I am an adjunct instructor in history. Recently I was teaching the early Cold War in my American History survey class for my community college students and decided to go off on a 20 minute digression in class about how the 1950s nuclear arms race had implications not just for navigating the continuing risk of nuclear war in the present, but also for navigating the potential for an AI arms race. Most of my students either did not seem to be aware that there could potentially be an AI arms race, or were under the impression that such an AI arms race would merely be a subset of a larger military/economic race... (read 1614 more words →)

Replying toWhat does it take to ban a thing?

Matthew_Opitz3y

What does it take to ban a thing?

Good examples to consider! Has there ever been a technology that has been banned or significantly held back via regulation that spits out piles of gold (not counting externalities) and that doesn't have a next-best alternative that replicates 90%+ of the value of the original technology while avoiding most of the original technology's downsides?

The only way I could see humanity successfully slowing down AGI capabilities progress is if it turns out that advanced narrow-AIs manage to generate more utility than humans know what to do with initially. Perhaps it takes time (a generation or more?) for human beings to even figure out what to do with a certain amount of new... (read more)

Replying toWhich technologies are stuck on initial adoption?

Matthew_Opitz3y

Which technologies are stuck on initial adoption?

Why wasn't there enough experimentation to figure out that Zoom was an acceptable & cheaper/more convenient 80% replacement to in-person instruction rather than an unacceptable 50% simulacra of teaching? Because experimentation takes effort and entails risk.

Most experiments don't pan out (don't yield value). Every semester I try out a few new things (maybe I come up with a new activity, or a new set of discussion questions for one lesson, or I try out a new type of assignment), and only about 10% of these experiments are unambiguous improvements. I used to do even more experiments when I started teaching because I knew that I had no clue what I was... (read more)

Even if human & AI alignment are just as easy, we are screwed

Matthew_Opitz

I get the sense that something like Eliezer's concept of "deep security" as opposed to "ordinary paranoia" is starting to seep into mainstream consciousness—not as quickly or as thoroughly as we'd like, to be sure. But more people are starting to understand the concept that we should not be aiming for an eventual Artificial Super-Intelligence (ASI) that is constantly trying to kill us, but is constantly being thwarted by humanity always being just clever enough to stay one step ahead of it. The 2014 film "Edge of Tomorrow" is a great illustration of the hopelessness of this strategy. If humanity in that film has to be clever enough to thwart the machines... (read 1431 more words →)

Bing AI Generating Voynich Manuscript Continuations - It does not know how it knows

Matthew_Opitz

Epistemic status: slightly giddy and freaked out. Possibly rushing to judgment. But there's definitely something here that others should check out for themselves...

The Voynich Manuscript (VMS), for anyone who is familiar with it, has an infuriating liminality. It seems a little like everything but not exactly like anything in particular. It is right on the edge. Can we currently prove that its text is meaningful and decipherable (let alone prove that any one deciphering is valid)? No. Can we prove, then, that it is meaningless gibberish? Also, no. It seems to endure various statistical attacks, such as first-order and second-order token entropy, and Zipf's Law, and countless others, just well enough to... (read 3659 more words →)

Could an initial AI Dunning-Kruger Effect save humanity by giving us an initial AI mini-Chernobyl as a wake-up call?

Note that hope is not a strategy, so I'm not saying that this is a likely scenario or something we should rely on. I'm just trying to brainstorm reasons for holding onto some shred of hope that we aren't 100% sure heading off some AI doom cliff where the first sign of our impending demise will be every human dropping dead around us from invisible nanobots or some other equally sophisticated scheme where an imperfectly-aligned AI would have had to deceive human-feedback evaluators while preparing an elaborate plan for instrumental world domination (once... (read 764 more words →)

Matthew_Opitz's Shortform

Matthew_Opitz

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

"NRx" vs. "Prog" Assumptions: Locating the Sources of Disagreement Between Neoreactionaries and Progressives (Part 1)

Matthew_Opitz

11y

I know that many people on LessWrong want nothing to do with "neoreaction." It does seem strange that a website commonly associated with techno-futurism, such as LessWrong, would end up with even the most tangential networked association with an intellectual current, such as neoreaction, that commonly includes nostalgia for absolute monarchies and other avatistic obessions.

Perhaps blame it on Yvain, AKA Scott Alexander of slatestarcodex.com for attaching this strange intellectual node to LessWrong. ; ) That's at least how I found out about neoreaction, and I doubt that I am alone in this.

Certainly many on LessWrong would view any association with "neoreaction" as a Greek gift to be avoided. I understand the concept... (read 3169 more words →)

340

LESSWRONG
LW

LESSWRONG
LW

Even if human & AI alignment are just as easy, we are screwed

The Academic Field Pyramid - any point to encouraging broad but shallow AI risk engagement?

Bing AI Generating Voynich Manuscript Continuations - It does not know how it knows

DELBERTing as an Adversarial Strategy

Matthew_Opitz

Proxi-Antipodes: A Geometrical Intuition For The Difficulty Of Aligning AI With Multitudinous Human Values

DELBERTing as an Adversarial Strategy

The Academic Field Pyramid - any point to encouraging broad but shallow AI risk engagement?

Even if human & AI alignment are just as easy, we are screwed

Bing AI Generating Voynich Manuscript Continuations - It does not know how it knows

Matthew_Opitz's Shortform

"NRx" vs. "Prog" Assumptions: Locating the Sources of Disagreement Between Neoreactionaries and Progressives (Part 1)

Matthew_Opitz

Even if human & AI alignment are just as easy, we are screwed

The Academic Field Pyramid - any point to encouraging broad but shallow AI risk engagement?

Bing AI Generating Voynich Manuscript Continuations - It does not know how it knows

DELBERTing as an Adversarial Strategy

Matthew_Opitz

Proxi-Antipodes: A Geometrical Intuition For The Difficulty Of Aligning AI With Multitudinous Human Values

DELBERTing as an Adversarial Strategy

The Academic Field Pyramid - any point to encouraging broad but shallow AI risk engagement?

Even if human & AI alignment are just as easy, we are screwed

Bing AI Generating Voynich Manuscript Continuations - It does not know how it knows

Matthew_Opitz's Shortform

"NRx" vs. "Prog" Assumptions: Locating the Sources of Disagreement Between Neoreactionaries and Progressives (Part 1)