Purplehermann - LessWrong

Biological risk from the mirror world

Purplehermann3d10

Is there a reason that random synthetic cells will not be mirror cells?

Biological risk from the mirror world

Purplehermann3d10

https://english.elpais.com/science-tech/2024-12-31/protocells-emerge-in-experiment-simulating-lifeless-world-there-is-no-divine-breath-of-life.html

We have here some scientists making cells. Looks like a dangerous direction

By default, capital will matter more than ever after AGI

Purplehermann7d3-4

Humans seem way more energy and resource efficient in general, paying for top talent is an exception not the rule- usually it's not worth paying for top talent.

Likely to see many areas where better economically to save on compute/energy by having human do some of the work.

Split information workers vs physical too, I expect them to have very different distributions of what the most useful configuration is.

This post ignores likely scientific advances in bioengineering and cyborg surgeries, I expect humans to be way more efficient for tons of jobs once the standard is 180 IQ with a massive working memory

Hire (or Become) a Thinking Assistant

Purplehermann9d20

I do things like this at times with my teams.

Important things:

Don't think you need to solve the actual problem for them
Do solve 'friction' for them as much as possible
Do feel free to look up other sources so you can offer more perspective and to take off the load of having to find relevant info
positive energy, attentive etc
if they're functioning well just watch and listen while being interested and unobtrusive, at most very minor inputs if you're pretty sure it'll be helpful

If stuck at a crossroads ask them how long they think each path will take/ how hard it'll be, and give them feedback if you think they're wrong. Help them start working on one, people can get stuck for longer than it would take to actually do one option.

if lost, methodically go through the different areas where the issue could be and methodically go through all the directions they could take for each area and in general. You don't need to think these up, but keep track of them and help guide towards picking apart the problem and solution spaces. This takes some mental load off.

Purplehermann's Shortform

Purplehermann9d40

A message from Claude:

'''This has been a fascinating and clarifying discussion. A few key insights I'll take away:

The distinction between bounded and unbounded optimization is more fundamental than specific value differences between AIs. The real existential threat comes from unbounded optimizers. The immune system/cancer metaphor provides a useful framework - it's about maintaining a stable system that can identify and prevent destructive unbounded growth, not about enforcing a single value set. The timing challenge is critical but more specific than I initially thought - we don't necessarily need the "first" AGI to be perfect, but we need bounded optimizers to establish themselves before any unbounded ones emerge.

Some questions this raises for further exploration:

What makes a Schelling fence truly stable under recursive self-improvement? Could bounded optimizers coordinate even with different base values, united by shared meta-level constraints? Are there ways to detect early if an AI system will maintain bounds during capability gain?

The framing of "cancer prevention" versus "value enforcement" feels like an important shift in how we think about AI governance and safety. Instead of trying to perfectly specify values, perhaps we should focus more on creating robust self-limiting mechanisms that can persist through capability gains.'''

The Field of AI Alignment: A Postmortem, and What To Do About It

Purplehermann10d80

A few thoughts.

Have you checked what happens when you throw physic postdocs at the core issues - do they actually get traction or just stare at the sheer cliff for longer while thinking? Did anything come out of the Illiad meeting half a year later? Is there a reason that more standard STEMs aren't given an intro into some of the routes currently thought possibly workable, so they can feel some traction? I think either could be true- that intelligence and skills aren't actually useful right now, the problem is not tractable, or better onboarding could let the current talent pool get traction - and either way it might not be very cost effective to get physics postdocs involved.
Humans are generally better at doing things when they have more tools available. While the 'hard bits' might be intractable now, they could well be easier to deal with in a few years after other technical and conceptual advances in AI, and even other fields. (Something something about prompt engineering and Anthropic's mechanistic interpretability from inside the field and practical quantum computing outside).

This would mean squeezing every drop of usefulness out of AI at each level of capability, to improve general understanding and to leverage it into breakthroughs in other fields before capabilities increase further. In fact, it might be best to sabotage semiconductor/chip production once the models one gen before super-intelligence/extinction/ whatever, giving maximum time to leverage maximum capabilities and tackle alignment before the AIs get too smart.

How close is mechanistic interpretability to the hard problems, and what makes it not good enough?

Purplehermann's Shortform

Purplehermann10d30

The point was more about creating your own data being easy, just generate code then check it by running it. Save this code, and later use it for training.

If we wanted to go the way of AlphaZero it doesn't seem crazy.

De-enforce commands, functions, programs which output errors, for a start.

I didn't think of the pm as being trained by these games, that's interesting. Maybe have two instances competing to get closer on some test cases the pm can prepare to go with the task, and have them competing on time, compute, memory, and accuracy. You can de-enforce the less accurate, and if fully accurate they can compete on time, memory, cpu.

I'm not sure "hard but possible" is the bar - you want lots of examples of what doesn't work along with what does, and you want it for easy problems and hard ones so the model learns everything

Purplehermann's Shortform

Purplehermann10d30

Product manager, non-technical counterpart to a team lead in a development team

Purplehermann's Shortform

Purplehermann10d42

I notice that I'm confused.

Google made an amazing AI for playing chess, by allowing it to make its own data.

Why hasn't the same thing happened for programming? Have it generate a bunch of pictures with functionality expectations (a PM basically), have it write and run code, then check the output against the requirements it created, then try again when it doesn't come out right.

This is even easier where the pm is unnecessary - leetcode, codewars, euler...

You could also pay PMs to work with the AI developers, instead of the code tutors xAI is hiring.

There seems to be a preference to having the LLMs memorize code instead of figuring things out itself.

If you run out of things like that you could have it run random programs in different languages, only learning from those that work.

I haven't used genesis, but that also seems like a mostly-built validator for programs that AIs can use to create and train on their own data.

With the amount of compute going into training, it should be easy to create huge amounts of data?

A Matter of Taste

Purplehermann17d21

There's a certain breadth of taste in reading you can only aquire by reading (and enjoying!) low quality internet novels after you've already developed sophisticated taste.

LESSWRONG
is fundraising!
LW
$

Posts

Wiki Contributions

Comments