All of kolmplex's Comments + Replies

This system card seems to only cover o1-preview and o1-mini, and excludes their best model o1.

Looks like they are focusing on animated avatars. I expect the realtime photorealistic video to be the main bottleneck, so I agree that removing that requirement will probably speed things up.

2RogerDearnaley
Yes, they're going with a cute Pixar-like style (I gather they hired an ex-Pixar animator). Anime would likely also work for something like this. Both of those might reduce the psychological impact a little by adding an air of unreality, though I suspect a sufficiently interactive conversation would still have a good deal of impact.

Yeah, I also doubt that it will be the primary way of using AI. I'm just saying that AI avatar tech could exist soon and that it will change how the public views AI.

ChatGPT itself is in a bit of a similar situation. It changed the way many people think of AI, even for those who don't find it particularly useful.

0[anonymous]
Absolutely. I kinda imagine Microsofts Cortana putting her ghostly fingers through foreground apps in windows, especially native Microsoft apps, to try to help the user out. She would seem to be actually physically helping you and/or actually existing in your computers desktop. But it's all vestigial and extra pixel rendering that isn't helping the user accomplish anything. Even the concept of gender for the ai or a voice is vestigial.

Thanks for compiling your thoughts here! There's a lot to digest, but I'd like to offer a relevant intuition I have specifically about the difficulty of alignment.

Whatever method we use to verify the safety of a particular AI will likely be extremely underdetermined. That is, we could verify that the AI is safe for some set of plausible circumstances but that set of verified situations would be much, much smaller than the set of situations it could encounter "in the wild".

The AI model, reality, and our values are all high entropy, and our verification/safe... (read more)

1Seth Herd
I like this intuitive argument.  Now multiply that difficulty by needing to get many more individual AGIs aligned if we see a multipolar scenario, since defending against misaligned AGI is really difficult.

I think both of those things are worth looking into (for the sake of covering all our bases), but by the time alarm bells go off it's already too late.

It's a bit like a computer virus. Even after Stuxnet became public knowledge, it wasn't possible to just turn it off. And unlike Stuxnet, AI-in-the-wild could easily adapt to ongoing changes.

I've got some object-level thoughts on Section 1. 

With a model of AGI-as-very-complicated-regression, there is an upper bound of how fulfilled it can actually be. It strikes me that it would simply fulfill that goal, and be content.

It'd still need to do risk mitigation, which would likely entail some very high-impact power seeking behavior. There are lots of ways things could go wrong even if its preferences saturate.

For example, it'd need to secure against the power grid going out, long-term disrepair, getting nuked, etc. 

To argue that an AI mig

... (read more)
1zrezzed
  I this feels like the right analogy to consider. And in considering this thought experiment, I'm not sure trying to solve alignment is the only/best way to reduce risks. This hypothetical seems open to reducing risk by 1) better understanding how to detect these actors operating at large scale 2) researching resilient plug-pulling strategies 

Makes sense. From the post, I thought you'd consider 90% as too high an estimate.

My primary point was that an estimate of 10% and 90% (or maybe even >95%) aren't much different from a Bayesian evidence perspective. My secondary point was that it's really hard to meaningfully compare different peoples' estimates because of wildly varying implicit background assumptions.

I might be misunderstanding some key concepts but here's my perspective:

It takes more Bayesian evidence to promote the subjective credence assigned to a belief from negligible to non-negligible than from non-negligible to pretty likely. See the intuition on log odds and locating the hypothesis.

So, going from 0.01% to 1% requires more Bayesian evidence than going from 10% to 90%. The same thing applies for going from 99% to 99.99%.

A person could reasonably be considered super weird for thinking something with a really low prior has even a 10% chance of bein... (read more)

4Shmi
I agree that 10-50-90% is not unreasonable in a pre-paradigmatic field. Not sure how it translates into words. Anything more confident than that seems like it would hit the limits of our understanding of the field, which is my main point.

This DeepMind paper explores some intrinsic limitations of agentic LLMs. The basic idea is (my words):

If the training data used by an LLM is generated by some underlying process (or context-dependent mixture of processes) that has access to hidden variables, then an LLM used to choose actions can easily go out-of-distribution.

For example, suppose our training data is a list of a person's historical meal choices over time, formatted as tuples that look like (Meal Choice, Meal Satisfaction). The training data might look like (Pizza,  Yes)(Cheeseburger, ... (read more)

I think that humans are sorta "unaligned", in the sense of being vulnerable to Goodhart's Law.

A lot of moral philosophy is something like:

  1. Gather our odd grab bag of heterogeneous, inconsistent moral intuitions
  2. Try to find a coherent "theory" that encapsulates and generalizes these moral intuitions
  3. Work through the consequences of the theory and modify it until you are willing to bite all the implied bullets. 

The resulting ethical system often ends up having some super bizarre implications and usually requires specifying "free variables" that are (arguab... (read more)