LESSWRONG
LW

All of roha's Comments + Replies

EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024

roha10mo62

Further context about the "recent advancements in the AI sector have resolved this issue" paragraph:

Contained in a16z letter to UK parliament: https://committees.parliament.uk/writtenevidence/127070/pdf/
Contained in a16z letter to Biden, signed by Andreessen, Horowitz, LeCun, Carmack et al.: https://x.com/a16z/status/1720524920596128012
Carmack claiming not to have proofread it, both Carmack and Casado admitting the claim is false: https://x.com/GarrisonLovely/status/1799139346651775361

Ilya Sutskever and Jan Leike resign from OpenAI [updated]

[+]roha11mo-6-5

Ilya Sutskever and Jan Leike resign from OpenAI [updated]

roha11mo10

I assume they can't make a statement and that their choice of next occupation will be the clearest signal they can and will send out to the public.

Ilya Sutskever and Jan Leike resign from OpenAI [updated]

roha11mo169

He has a stance towards risk that is a necessary condition for becoming the CEO of a company like OpenAI, but doesn't give you a high probability of building a safe ASI:

https://blog.samaltman.com/what-i-wish-someone-had-told-me
- "Inaction is a particularly insidious type of risk."
https://blog.samaltman.com/how-to-be-successful
- "Most people overestimate risk and underestimate reward."
https://blog.samaltman.com/upside-risk
- "Instead of downside risk [2], more investors should think about upside risk—not getting to invest in the company that will provide the retur

roha1y174

If everyone has his own asteroid impact, earth will not be displaced because the impulse vectors will cancel each other out on average*. This is important because it will keep the trajectory equilibrium of earth, which we know since ages from animals jumping up and down all the time around the globe in their games of survival. If only a few central players get asteroid impacts it's actually less safe! Safety advocates might actually cause the very outcomes that they fear!

*I've a degree in quantum physics and can derive everything from my model of the universe. This includes moral and political imperatives that physics dictate and thus most physicists advocate for.

[April Fools' Day] Introducing Open Asteroid Impact

roha1y150

We are decades if not centuries away from developing true asteroid impacts.

[April Fools' Day] Introducing Open Asteroid Impact

roha1y120

Given all the potential benefits there is no way we are not going to redirect asteroids to earth. Everybody will have an abundance of rare elements.

xlr8

Linch1y137

rare earth metals? More like common space metals, amirite?

Many arguments for AI x-risk are wrong

roha1y172

Some context from Paul Christiano's work on RLHF and a later reflection on it:

Christiano et al.: Deep Reinforcement Learning from Human Preferences

In traditional reinforcement learning, the environment would also supply a reward [...] and the
agent’s goal would be to maximize the discounted sum of rewards. Instead of assuming that the
environment produces a reward signal, we assume that there is a human overseer who can express preferences between trajectory segments. [...] Informally, the goal of the agent is to produce trajectories which are preferred by t

... (read more)

Against most, but not all, AI risk analogies

roha1y12

Replacing must by may is a potential solution to the issues discussed here. I think analogies are misleading when they are used as a means for proof, i.e. convincing yourself or others of the truth of some proposition, but they can be extremely useful when they are used as a means for exploration, i.e. discovering new propositions worth of investigation. Taken seriously, this means that if you find something of interest with an analogy, it should not mark the end of a thought process or conversation, but the beginning of a validation process: Is there just... (read more)

The Next ChatGPT Moment: AI Avatars

roha1y30

For collaboration on job-like tasks that assumption might hold. For companionship and playful interactions I think the visual domain, possibly in VR/AR, will be found to be relevant and kept. Given our psychological priors, I also think for many people it may feel like a qualitative change in what kind of entity we are interacting with - from lifeless machine, over uncanny human imitation, to believable personality on another substrate.

The Next ChatGPT Moment: AI Avatars

roha1y10

Empirical data point: In my experience, talking to Inflection's Pi on the phone covers the low latency integration of "AI is capable of holding a conversation over text, transcribing speech to text, and synthesizing natural-sounding speech" sufficiently well to pass some bar of "feels authentically human" to me until you try to test its limits. I imagine that subjective experience to be more likely to appear if you don't have background knowledge about LLMs / DL. Its main problems are 1) keeping track of context in plausibly human-like way (e.g. playing a ... (read more)

re: Yudkowsky on biological materials

roha1y51

Meta-questions: How relevant are nanotechnological considerations for x-risk from AI? How suited are scenarios involving nanotech for making a plausible argument for x-risk from AI, i.e. one that convinces people to take the risk seriously and to become active in attempting to reduce it?

gilch1y158

The AI x-risk thesis doesn't require nanotech. Dangerously competent AIs are not going to openly betray us until they think they can win, which means, at minimum, they don't need us to maintain the compute infrastructure they'd need to stay alive. Currently, AI chips do require our globalized economy to produce.

AI takeover is a highly disjunctive claim; there are a lot of different ways it could happen, but the thesis only requires one. We could imagine a future society that has become more and more dependent on AIs and has semiautonomous domestic and indu... (read more)

Why Yudkowsky is wrong about "covalently bonded equivalents of biology"

roha1y10

It seems to me as if we expect the same thing then: If humanity was largely gone (e.g. by several engineered pandemics) and as a consequence the world economy came to a halt, an ASI would probably be able to sustain itself long enough by controlling existing robotic machinery, i.e. without having to make dramatic leaps in nanotech or other technology first. What I wanted to express with "a moderate increase of intelligence" is that it won't take an ASI at the level of GPT-142 to do that, but GPT-7 together with current projects in robotics might suffice to... (read more)

Why Yudkowsky is wrong about "covalently bonded equivalents of biology"

roha1y10

The question in point 2 is whether an ASI could sustain itself without humans and without new types of hardware such as Drexler style nanomachinery, which to a significant portion of people (me not included) seems to be too hypothetical to be of actual concern. I currently don't see why the answer to that question should be a highly certain no, as you seem to suggest. Here are some thoughts:

The world economy is largely catering to human needs, such as nutrition, shelter, healthcare, personal transport, entertainment and so on. Phenomena like massive food w

... (read more)

5Eliezer Yudkowsky1y

I rather expect that existing robotic machinery could be controlled by ASI rather than "moderately smart intelligence" into picking up the pieces of a world economy after it collapses, or that if for some weird reason it was trying to play around with static-cling spaghetti It could pick up the pieces of the economy that way too.

Why Yudkowsky is wrong about "covalently bonded equivalents of biology"

roha1y20

An attempt to optimize for a minimum of abstractness, picking up what was communicated here:

How could an ASI kill all humans? Setting off several engineered pandemics a month with a moderate increase of infectiousness and lethality compared to historical natural cases.
How could an ASI sustain itself without humans? Conventional robotics with a moderate increase of intelligence in planning and controlling the machinery.

People coming in contact with that argument will check its plausibility, as they will with a hypothetical nanotech narrative. If so inclined... (read more)

Eliezer Yudkowsky1y115

It's false that currently existing robotic machinery controlled by moderately smart intelligence can pick up the pieces of a world economy after it collapses. One well-directed algae cell could, but not existing robots controlled by moderate intelligence.

Memetic Judo #1: On Doomsday Prophets v.3

roha2y21

It is an argument by induction based on a naive extrapolation of a historic trend.

This characterization could be a good first step to construct a convincing counter argument. Are there examples of other arguments by induction that simply extrapolate historic trends, where it is much more apparent that it is an unreliable form of reasoning? To be intuitive it must not be too technical, e.g. "people claiming to have found a proof to Fermat's last theorem have always been wrong in the past (until Andrew Wiles came along)" would probably not work well.

2Max TK2y

Good idea! I thought of this one: https://energyhistory.yale.edu/horse-and-mule-population-statistics/

All AGI Safety questions welcome (especially basic ones) [July 2023]

roha2y20

There seems to be a clear pattern of various people downplaying AGI risk on the basis of framing it as mere speculation, science fiction, hysterical, unscientific, religious, and other variations of the idea that it is not based on sound foundations, especially when it comes to claims of considerable existential risk. One way to respond to that is by pointing at existing examples of cutting-edge AI systems showing unintended or at least unexpected/unintuitive behavior. Has someone made a reference collection of such examples that are suitable for grounding... (read more)