kolmplex — LessWrong

LESSWRONG
LW

Replying toOpenAI o1

kolmplex1y

OpenAI o1

This system card seems to only cover o1-preview and o1-mini, and excludes their best model o1.

Replying toThe Next ChatGPT Moment: AI Avatars

kolmplex2y

The Next ChatGPT Moment: AI Avatars

Looks like they are focusing on animated avatars. I expect the realtime photorealistic video to be the main bottleneck, so I agree that removing that requirement will probably speed things up.

Replying toThe Next ChatGPT Moment: AI Avatars

kolmplex2y

The Next ChatGPT Moment: AI Avatars

Yeah, I also doubt that it will be the primary way of using AI. I'm just saying that AI avatar tech could exist soon and that it will change how the public views AI.

ChatGPT itself is in a bit of a similar situation. It changed the way many people think of AI, even for those who don't find it particularly useful.

The Next ChatGPT Moment: AI Avatars

kolmplex

kolmplex, southpaw

Epistemic Status: Speculative. Dependent on intuitions about near-term AI tech and human psychology.

Claim: Within the next 1-3 years, many people will have an interaction with an AI avatar that feels authentically human. This will significantly amplify the public perception of current AI capabilities and risks.

An AI avatar is a realistic AI-generated render of a human (speech and video) that can have a real-time conversation with a human, for example over a video call.

The individual components needed to implement AI avatars already exist. AI is capable of holding a conversation over text, transcribing speech to text, and synthesizing natural-sounding speech.^[1] Generating photorealistic video of a talking human is currently limited, but still impressive and making... (read 267 more words →)

Replying toThe bullseye framework: My case against AI doom

kolmplex3y

The bullseye framework: My case against AI doom

Thanks for compiling your thoughts here! There's a lot to digest, but I'd like to offer a relevant intuition I have specifically about the difficulty of alignment.

Whatever method we use to verify the safety of a particular AI will likely be extremely underdetermined. That is, we could verify that the AI is safe for some set of plausible circumstances but that set of verified situations would be much, much smaller than the set of situations it could encounter "in the wild".

The AI model, reality, and our values are all high entropy, and our verification/safety methods are likely to be comparatively low entropy. The set of AIs that pass our tests will have members whose properties haven't been fully constrained.

This isn't even close to a complete argument, but I've found it helpful as an intuition fragment.

Replying toAdumbrations on AGI from an outsider

kolmplex3y

Adumbrations on AGI from an outsider

I think both of those things are worth looking into (for the sake of covering all our bases), but by the time alarm bells go off it's already too late.

It's a bit like a computer virus. Even after Stuxnet became public knowledge, it wasn't possible to just turn it off. And unlike Stuxnet, AI-in-the-wild could easily adapt to ongoing changes.

Replying toAdumbrations on AGI from an outsider

kolmplex3y

Adumbrations on AGI from an outsider

I've got some object-level thoughts on Section 1.

With a model of AGI-as-very-complicated-regression, there is an upper bound of how fulfilled it can actually be. It strikes me that it would simply fulfill that goal, and be content.

It'd still need to do risk mitigation, which would likely entail some very high-impact power seeking behavior. There are lots of ways things could go wrong even if its preferences saturate.

For example, it'd need to secure against the power grid going out, long-term disrepair, getting nuked, etc.

To argue that an AI might change its goals, you need to develop a theory of what’s driving those changes–something like, AI wants more utils–and probably need something like sentience,

... (read 472 more words →)

Replying toHow can one rationally have very high or very low probabilities of extinction in a pre-paradigmatic field?

kolmplex3y

How can one rationally have very high or very low probabilities of extinction in a pre-paradigmatic field?

Makes sense. From the post, I thought you'd consider 90% as too high an estimate.

My primary point was that an estimate of 10% and 90% (or maybe even >95%) aren't much different from a Bayesian evidence perspective. My secondary point was that it's really hard to meaningfully compare different peoples' estimates because of wildly varying implicit background assumptions.

Replying toHow can one rationally have very high or very low probabilities of extinction in a pre-paradigmatic field?

kolmplex3y

How can one rationally have very high or very low probabilities of extinction in a pre-paradigmatic field?

I might be misunderstanding some key concepts but here's my perspective:

It takes more Bayesian evidence to promote the subjective credence assigned to a belief from negligible to non-negligible than from non-negligible to pretty likely. See the intuition on log odds and locating the hypothesis.

So, going from 0.01% to 1% requires more Bayesian evidence than going from 10% to 90%. The same thing applies for going from 99% to 99.99%.

A person could reasonably be considered super weird for thinking something with a really low prior has even a 10% chance of being true, but it isn't much weirder to think something has a 10% chance of being true than a 90% chance of... (read more)

kolmplex's Shortform

kolmplex

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

Replying toAgentized LLMs will change the alignment landscape

kolmplex3y*

Agentized LLMs will change the alignment landscape

This DeepMind paper explores some intrinsic limitations of agentic LLMs. The basic idea is (my words):

If the training data used by an LLM is generated by some underlying process (or context-dependent mixture of processes) that has access to hidden variables, then an LLM used to choose actions can easily go out-of-distribution.

For example, suppose our training data is a list of a person's historical meal choices over time, formatted as tuples that look like (Meal Choice, Meal Satisfaction). The training data might look like (Pizza, Yes)(Cheeseburger, Yes)(Tacos, Yes).

When the person originally chose what to eat, they might have had some internal idea of what food they wanted to eat that day, so the... (read more)

kolmplex3y*Quick Take

I think that humans are sorta "unaligned", in the sense of being vulnerable to Goodhart's Law.

A lot of moral philosophy is something like:

Gather our odd grab bag of heterogeneous, inconsistent moral intuitions
Try to find a coherent "theory" that encapsulates and generalizes these moral intuitions
Work through the consequences of the theory and modify it until you are willing to bite all the implied bullets.

The resulting ethical system often ends up having some super bizarre implications and usually requires specifying "free variables" that are (arguably) independent of our original moral intuitions.

In fact, I imagine that optimizing the universe according to my moral framework looks quite Goodhartian to many people.

Some examples of implications of my... (read more)