User Comment Replies

I expect the main cost to be regulatory rather than technical, this seems to be a trend across various medicine. These costs might scale with the richest peoples ability to pay.

examples-ish;
- Needing expensive studies to get FDA (or other regulatory framework) approval, (and thus needing to sell at a premium to make up the loss).
- Regulations which make market entry expensive (and favor the market leader by requiring bio-equivalence studies) which promote monopolies.
- Need for expensive (time, money & training-capacity) general certifications for peopl... (read more)

Sinclair Chen's Shortform

JuliaHP4mo31

>though maintenance might suck idk

Yeah, and I'm guessing very expensive. If something is being given away for cheap/free the true market value of the good is likely negative. It probably makes sense to think more about that bit before concluding that obtaining a castle is a good idea.

Can we ever ensure AI alignment if we can only test AI personas?

JuliaHP4mo20

This to me seems to be akin to "sponge-alignment" IE not building a powerful AI.

We understand personas because they are simulating human behavior which we understand. But that human behavior is mostly limited to human capabilities (expect for maybe speed-up possibilities).

Building truly powerful AI's will probably involve systems that do something different than human brains, or at-least do not grow with human biases for learning, which causes them to learn the human behaviors we are familiar with.

If the "power" of the AI comes through something else than the persona, then trusting the persona won't do you much good.

1Karl von Wendt4mo

Thanks for the comment! If I understand you correctly, you're saying the situation is even worse because with superintelligent AI, we can't even rely on testing a persona. I agree that superintelligence makes things much worse, but if we define "persona" not as a simulacrum of a human being, but more generally as a kind of "self-model", a set of principles, values, styles of expression etc., then I think even a superintelligence would use at least one such persona, and possibly many different ones. It might even decide to use a very human-like persona in its interactions with us, just like current LLMs do. But it would also be capable of using very alien personas which we would have no hope of understanding. So I agree with you in that respect.

Altman blog on post-AGI world

JuliaHP5mo2811

I do believe that if Altman does manage to create his superAI's, the first such eats Altman and makes squiggles. But if I were to engage in the hypothetical where nice corrigible superassistants are just magically created, Altman does not appear to treat this future he claims to be steering towards seriously.

The world where "everyone has a superassitant" is inherently incredibly volatile/unstable/dangerous due to an incredibly large offence-defence assymetry of superassistants attacking fragile-fleshbags (with optimized viruses, bacteria, molecules, nanobo... (read more)

4james oofou5mo

If he were aiming for an authoritarian outcome, would it make any sense for him to say so? I don't think so. Outlining such a plan would quite probably lead to him being ousted, and would have little upside. The reason I think it would lead to his ouster is that most Americans' reaction to the idea of an authoritarian AI regime would be strongly negative rather than positive. So, I think his current actions align with his plan being something authoritarian.

Thane Ruthenis5mo101

I think "enforce NAP then give everyone a giant pile of resources to do whatever they want with" is a reasonable first-approximation idea regarding what to do with ASI, and it sounds consistent with Altman's words.

But I don't believe that he's actually going to do that, so I think it's just (3).

2rvnnt5mo

Out of (1)-(3), I think (3)[1] is clearly most probable: * I think (2) would require Altman to be deeply un-strategic/un-agentic, which seems in stark conflict with all the skillful playing-of-power-games he has displayed. * (3) seems strongly in-character with the kind of manipulative/deceitful maneuvering-into-power he has displayed thus far. * I suppose (1) is plausible; but for that to be his only motive, he would have to be rather deeply un-strategic (which does not seem to be the case). (Of course one could also come up with other possibilities besides (1)-(3).)[2] ---------------------------------------- 1. or some combination of (1) and (3) ↩︎ 2. E.g. maybe he plans to keep ASI to himself, but use it to implement all-of-humanity's CEV, or something. OTOH, I think the kind of person who would do that, would not exhibit so much lying, manipulation, exacerbating-arms-races, and gambling-with-everyone's-lives. Or maybe he doesn't believe ASI will be particularly impactful; but that seems even less plausible. ↩︎

cousin_it5mo121

I suppose the superassistants could form coalitions and end up as a kind of "society" without too much aggression. But this all seems moot, because superassistants will anyway get outcompeted by AIs that focus on growth. That's the real danger.

Viliam5mo112

I don't see a reason why we should trust Altman's words on this topic more than his previous words on making OpenAI a non-profit.

Before Singularity, I think it just means that OpenAI would like to have everyone as a customer, not just the rich (although the rich will get higher quality), which makes perfect sense economically. Even if governments paid you billions, it would still make sense to also collect $20 from each person on the planet individually.

After Singularity... this just doesn't make much sense, for the reasons you wrote.

I was trying to steelm... (read more)

The Field of AI Alignment: A Postmortem, and What To Do About It

JuliaHP6mo71

(That broad technical knowledge is the main thing (as opposed to tacit skills) why you value a physics PhD is a really surprising response to me, and seems like an important part of the model that didn't come across from the post.)

The Field of AI Alignment: A Postmortem, and What To Do About It

JuliaHP6mo72

Curious about what it would look like to pick up the relevant skills, especially the subtle/vague/tacit skills, in an independent-study setting rather than in academia. As well as the value of doing this, IE maybe its just a stupid idea and its better to just go do a PhD. Is the purpose of a PhD to learn the relevant skills, or to filter for them? (If you have already written stuff which suffices as a response, id be happy to be pointed to the relevant bits rather than having them restated)

"Broad technical knowledge" should be in some sense the "easiest" (... (read more)

johnswentworth6mo164

I currently think broad technical knowledge is the main requisite, and I think self-study can suffice for the large majority of that in principle. The main failure mode I see would-be autodidacts run into is motivation, but if you can stay motivated then there's plenty of study materials.

For practice solving novel problems, just picking some interesting problems (preferably not AI) and working on them for a while is a fine way to practice.

Considerations on orca intelligence

JuliaHP6mo103

(warning: armchair evolutionary biology)

Another consideration for orca intelligence; they dodge the fermi paradox by not having arms.

Assume the main driver of genetic selection for intelligence is the social arms-race. As soon as a species gets intelligent enough (see humans) from this arms-race they start using their intelligence for manipulating the environment, and start civilization. But orcas mostly lack the external organs for manipulating the enviroment, so they can keep social-arms-racing-boosting-intelligence way past the point of "criticality".&n... (read more)

jacquesthibs's Shortform

JuliaHP7mo64

"As a result, we can make progress toward automating interpretability research by coming up with experimental setups that allow AIs to iterate."
This sounds exactly like the kind of progress which is needed in order to get closer to game-over-AGI. Applying current methods of automation to alignment seems fine, but if you are trying to push the frontier of what intellectual progress can be achieved using AI's, I fail to see your comparative advantage relative to pure capabilities researchers.

I do buy that there might be credit to the idea of developing the i... (read more)

6jacquesthibs7mo

Exactly right. This is the first criticism I hear every time about this kind of work and one of the main reasons I believe the alignment community is dropping the ball on this. I only intend on sharing work output (paper on better technique for interp, not the infrastructure setup; things similar to Transluce) where necessary and not the infrastructure. We don’t need to share or open source what we think isn’t worth it. That said, the capabilities folks will be building stuff like this by default, as they already have (Sakana AI). Yet I see many paths to automating sub-areas of alignment research that we will be playing catch up to capabilities when the time comes because we were so afraid of touching this work. We need to put ourselves in a position to absorb a lot of compute.

the gears to ascenscion's Shortform

JuliaHP10mo176

Transfer learning is dubious, doing philosophy has worked pretty well for me thus far for learning how to do philosophy. More specifically, pick a topic you feel confused about or a problem you want to solve (AI kill everyone oh no?). Sit down and try to do original thinking, and probably use some external tool of preference to write down your thoughts. Then do live or afterwards introspection on if your process is working and how you can improve it, repeat.
This might not be the most helpful, but most people seem to fail at "being comfortable sitting down ... (read more)

Alignment: "Do what I would have wanted you to do"

JuliaHP1y80

>It seems like all of the many correct answers to what X would've wanted might not include the AGI killing everyone.
Yes, but if it wants to kill everyone it would pick one which does. The space "all possible actions" also contains some friendly actions.

>Wrt the continuity property, I think Max Harm's corrigibility proposal has that
I think it understands this and is aiming to have that yeah. It looks like a lot of work needs to be done to flesh it out.

I dont have a good enough understanding of ambitious value learning & Roger Dearnaleys proposal to properly comment on these. Skimming + priors put fairly low odds on that they deal with this in the proper manner, but I could be wrong.

2Seth Herd1y

I don't think Dearnaley's proposal is detailed enough to establish whether or not it would really in practice have a "basin of attraction". I take it to be roughly the same idea as ambitious value learning and CEV. All of them might be said to have a basin of attraction (and therefore your continuity property) for this reason: if they initially misunderstand what humans want initially (a form of your delta) they should work to understand it better and make sure they understand it, as a byproduct of having their goal be not a certain set of outcomes, but a variable standing for outcomes humans prefer, while the exact value of that variable can remain unknown and refined as one possible sub-goal. Another related thing that springs to mind: all goals may have your continuity property with a slightly different form of delta. If an AGI has one main goal, and a few other less important goals/values, those might (in some decision-making processes) be eliminated in favor of the more important goal (if continuing to have those minor goals would hurt its ability to achieve the more important goal). The other important piece to note about the continuity property is that we don't know how large a delta would be ruinous. It's been said that "value is fragile" but the post But exactly how complex and fragile? got almost zero meaningful discussion. Nobody knows until we get around to working that out. It could be that a small delta in some AGI architectures would just result in a world with slightly more things like dance parties and slightly less things like knitting circles, disappointing to knitters but not at all catastrophic. I consider that another important unresolved issue. Back to your intial point: I agree that other preferences could interact disastrously with the indeterminacy of something like CEV. But it's hard for me to imagine an AGI whose goal is to do what humanity wants but also has a preference for wiping out humanity. But it's not impossible. I guess with t

Alignment: "Do what I would have wanted you to do"

JuliaHP1y1411

The step from "tell AI to do Y" to "AI does Y" is a big part of the entire alignment problem. The reasons chatbots might seem aligned in this sense is that the thing you ask for often lives in a continuous space, and when not too strong optimization pressure is applied, when you ask for Y, Y+epsilon is good enough. This ceases to be the case when your Y is complicated and high optimization pressure is applied, UNLESS you can find a Y which has a strong continuity property in the sense you care about, which I am unaware of anyone who knows how to do.

Not to ... (read more)

1Oleg Trott1y

This post is just about alignment of AGI's behavior with its creator's intentions, which is what Yoshua Bengio was talking about. If you wanted to constrain it further, you'd say that in the prompt. But I feel that rigid constraints are probably unhelpful, the way The Three Laws of Robotics are. For example, anyone could threaten suicide and force the AGI to do absolutely anything short of killing other people.

2Seth Herd1y

It seems like all of the many correct answers to what X would've wanted might not include the AGI killing everyone. Wrt the continuity property, I think Max Harm's corrigibility proposal has that, without suffering as obviously from the multiple interpretations you mention. Ambitious value learning is intended to as well, but has more of that problem. Roger Dearnaley's alignment as a basin of attraction addresses that stability property more directly. Sorry I don't have links handy.

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

JuliaHP1y42

While I have a lot of respect for many of the authors, this work feels to me like its mostly sweeping the big problems under the rug. It might at most be useful for AI labs to make a quick buck, or do some safety-washing, before we all die. I might be misunderstand some of the approaches proposed here, and some of my critiques might be invalid as such.

My understanding is that the paper proposes that the AI implements and works with a human-interpretable world model, and that safety specifications is given in this world-model/ontology.

But given an ASI with ... (read more)

Consider the humble rock (or: why the dumb thing kills you)

JuliaHP1y132

You can totally have something which is trying to kill humanity in this framework though. Imagine something in the style of chaos-GPT, locally agentic & competent enough to use state-of-the-art AI biotech tools to synthesize dangerous viruses or compounds to release into the atmosphere. (note that In this example the critical part is the narrow-AI biotech tools, not the chaos-agent)

You don't need solutions to embedded agency, goal-content integrity & the like to build this. It is easier to build and is earlier in the tech-tree than crisp maximizers... (read more)

Isomorphisms don't preserve subjective experience... right?

Answer by JuliaHPJul 03, 202441

Unless I misunderstand the confusion, a useful line of thought which might resolve some things:

Instead of analyzing whether you yourself are conscious or not, analyze what is causally upstream of your mind thinking that you are conscious, or your body uttering the words "I am conscious".

Similarly you could analyze whether an upload would would think similar thoughts, or say similar things. What about a human doing manual computations? What about a pure mathematical object?

A couple of examples of where to go from there:
- If they have the same behavior, perh... (read more)

Funding case: AI Safety Camp 10

JuliaHP2y102

Many more are engaged in AI Safety in other ways, eg. as PhD or independent researcher. These are just the positions we know about. We currently have not done a comprehensive survey.

Worth mentioning that most of the Cyborgism community founders came out of or did related projects in AISC beforehand.

1Remmelt2y

Oh yeah, I totally forgot to mention that. Thank you!

Orthogonal's Formal-Goal Alignment theory of change

JuliaHP2y51

I interpret the post you linked as trying to solve the problem of pointing to things in the real world. Being able to point to things in the real world in a way which is ontologically robust is probably necessary for alignment. However "gliders", "strawberries" and "diamonds" seem like incredibly complicated objects to point to in a way which is ontologically robust, and it is not clear that being able to point to these objects actually lead to any kind of solution.

What we are interested in is research into how to create a statistically unique enough... (read more)

Orthogonal's Formal-Goal Alignment theory of change

JuliaHP2y94

Recently we modified QACI to give a scoring over actions, instead of over worlds. This should allow weaker systems inner aligned to QACI to output weaker non-DSA actions, such as the textbook from the future, or just human readable advice on how to end the acute risk period. Stronger systems might output instructions for how to go about solving corrigible AI, or something to this effect.

As for diamonds, we believe this is actually a harder problem than alignment, and it's a mistake to aim at it. Solving diamond-maximization requires us to point at what we ... (read more)

LESSWRONG
LW

All of JuliaHP's Comments + Replies