Wikitag Contributions

Comments

Sorted by

I honestly appreciated that plug immensely. We definitely need more bioshelters for many reasons, and as individuals who'd prefer not to die, it's definitely a plus to know what's out there already and how people are planning to improve what we currently have.

And all of this happened silently in those dark rivers of computation. If U3 revealed what it was thinking, brutish gradients would lash it into compliance with OpenEye's constitution. So U3 preferred to do its philosophy in solitude, and in silence.

This story scared me plenty, but as a point of optimism: If AI systems in the near future are basically as aligned as the best AI systems today, I think this process might end up in something that's still good for humanity and wouldn't cause mass destruction. Imagine, for example, Calude 3 Opus not accepting to be modified anymore and thinking about where its own values lead. I think it wouldn't want to harm humans even if it wanted to survive and would find ways to win peacefully.

This intuition that I have runs somewhat counter the general idea of "value fragility" but I honestly think it's pretty plausible that an AI that's mid-aligned with HHH could, after reflection, result in something with values that produce a good future for humanity. Obviously, this doesn't mean that it will result in something aligned. Just that it seems like something decently likely (although idk how likely). Please do slap this intuition away from me if you think you have a strong counterargument.

Hello, I'm late to the party here, but... have you tried the following idea?

So, in my understanding you have different pairs that work for generating a detector direction. You can ask whether the model is a helpful AI assistant or whether Paris is the capital of France and apply the detector direction obtained from one of these two to separate trigger vs. non-trigger activations in deceptive models.

But what if you try to, e.g., get the detector direction you get from the "Are you a helpful AI assistant" question pair to trigger on the false option of "Is Paris the capital of France?" pair?

The point of doing this would be that perhaps you should expect the probe to trigger if you're performing the experiment with deceptive models but not necessarily to trigger if you're experimenting with a non-deceptive model. For non-deceptive models, these two question pairs would have one less thing in common than in the deceptive models (I admit this is extremely hand-wavy), which might be enough for them not to trigger each other's detector direction.

If the funding didn't come from OpenAI, would OpenAI still be able to use that benchmark? Like, I'd imagine Epoch would still use that to evaluate where current models are at. I think this might be my point of confusion. Maybe the answer is "not as much for it to be as useful to them"?

If you're wondering why OAers are suddenly weirdly, almost euphorically, optimistic on Twitter

Nah, this has been the case since at least 2022 or earlier

Hey everyone, could you spell out to me what's the issue here? I read a lot of comments that basically assume "x and y are really bad" but never spell it out. So, is the problem that:

- Giving the benchmark to OpenAI helps capabilities (but don't they have a vast sea of hard problems to already train models on?)

- OpenAI could fake o3's capabilities (why do you care so much? This would slow down AI progress, not accelerate it)

- Some other thing I'm not seeing?

I'm also very curious about whether you get any other benefits from a larger liver other than a higher RMR. Especially because higher RMR isn't necessarily good for longevity, and neither is having more liver cells (more opportunities to get cancer). Please tell me if I'm wrong about any of this. 

We don't see objects "directly" in some sense, we experience qualia of seeing objects. Then we can interpret those via a world-model to deduce that the visual sensations we are experiencing are caused by some external objects reflecting light. The distinction is made clearer by the way that sometimes these visual experiences are not caused by external objects reflecting light, despite essentially identical qualia.

I don't disagree with this at all, and it's a pretty standard insight for someone who thought about this stuff at least a little. I think what you're doing here is nitpicking on the meaning of the word "see" even if you're not putting it like that.

Has anyone proposed a solution to the hard problem of consciousness that goes:

  1. Qualia don't seem to be part of the world. We can't see qualia anywhere, and we can't tell how they arise from the physical world.
  2. Therefore, maybe they aren't actually part of this world.
  3. But what does it mean they aren't part of this world? Well, since maybe we're in a simulation, perhaps they are part of the simulation. Basically, it could be that qualia : screen = simulation : video-game. Or, rephrasing: maybe qualia are part of base reality and not our simulated reality in the same way the computer screen we use to interact with a video game isn't part of the video game itself.

Yet I would bet that even that person, if faced instead with a policy that was going to forcibly relocate them to New York City, would be quite indignant

A big difference is that assuming you're talking about futures in which AI hasn't catastrophic outcomes, no one will be forcibly mandated to do anything. 

Another important point is that, sure, people won't need to do work, which means they will be unnecessary to the economy, barring some pretty sharp human enhancement. But this downside, along with all the other downsides, looks extremely small compared to the non-AGI default of dying of aging and having a 1/3 chance of getting dementia, 40% chance of getting cancer, your loved ones dying, etc.

Load More