quila

Independent researcher theorizing about superintelligence-robust training stories.

If you disagree with me for reasons you expect I'm not aware of, please tell me!

If you have/find an idea that's genuinely novel/out-of-human-distribution while remaining analytical, you're welcome to send it to me to 'introduce chaos into my system'.

Contact: {discord: quilalove, matrix: @quilauwu:matrix.org, email: quila1<at>protonmail.com}

“some look outwards, at the dying stars and the space between the galaxies, and they dream of godlike machines sailing the dark oceans of nothingness, blinding others with their flames.”

Posts

Sorted by New

2quila's Shortform

5mo

2quila's Shortform

5mo

28Introduction and current research agenda

6mo

Wiki Contributions

Comments

quila's Shortform

quila10h10

Record yourself typing?

quila's Shortform

quila17h94

(Personal) On writing and (not) speaking

I often struggle to find words and sentences that match what I intend to communicate.

Here are some problems this can cause:

Wordings that are odd or unintuitive to the reader, but that are at least literally correct.^[1]
Not being able express what I mean, and having to choose between not writing it, or risking miscommunication by trying anyways. I tend to choose the former unless I'm writing to a close friend. Unfortunately this means I am unable to express some key insights to a general audience.
Writing taking lots of time: I usually have to iterate many times on words/sentences until I find one which my mind parses as referring to what I intend. In the slowest cases, I might finalize only 2-10 words per minute. Even after iterating, my words are often interpreted in ways I failed to foresee.

These apply to speaking, too. If I speak what would be the 'first iteration' of a sentence, there's a good chance it won't create an interpretation matching what I intend to communicate. In spoken language I have no chance to constantly 'rewrite' my output before sending it. This is one reason, but not the only reason, that I've had a policy of trying to avoid voice-based communication.

I'm not fully sure what caused this relationship to language. It could be that it's just a byproduct of being autistic. It could also be a byproduct of out-of-distribution childhood abuse.^[2]

^{^}
E.g., once I couldn't find the word 'clusters,' and wrote a complex sentence referring to 'sets of similar' value functions each corresponding to a common alignment failure mode / ASI takeoff training story. (I later found a way to make it much easier to read)
^{^}
(Content warning)
My primary parent was highly abusive, and would punish me for using language in the intuitive 'direct' way about particular instances of that. My early response was to try to euphemize and say-differently in a way that contradicted less the power dynamic / social reality she enforced.
Eventually I learned to model her as a deterministic system and stay silent / fawn.

Ilya Sutskever and Jan Leike resign from OpenAI

quila2d42

Leaving to dissuade others within the company is another possibility

Partitioned Book Club

quila4d10

Same as usual, with each person summarizing a chapter, and then there's a group discussion where they try to piece together the true story

Partitioned Book Club

quila5d10

Have you tried it with a book that doesn't have self-contained chapters?

quila's Shortform

quila5d10

Conditional on us solving alignment, I agree it's more likely that we live in an "easy-by-default" world, rather than a "hard-by-default" one in which we got lucky or played very well.

I think that language in discussions of anthropics is unintentionally prone to masking ambiguities or conflations, especially wrt logical vs indexical probability, so I want to be very careful writing about this. I think there may be some conceptual conflation happening here, but I'm not sure how to word it. I'll see if it becomes clear indirectly.

One difference between our intuitions may be that I'm implicitly thinking within a manyworlds frame. Within that frame it's actually certain that we'll solve alignment in some branches.

So if we then 'condition on solving alignment in the future', my mind defaults to something like this: "this is not much of an update, it just means we're in a future where the past was not a death outcome. Some of the pasts leading up to those futures had really difficult solutions, and some of them managed to find easier ones or get lucky. The probabilities of these non-death outcomes relative to each other have not changed as a result of this conditioning." (I.e I disagree with the top quote)

The most probable reason I can see for this difference is if you're thinking in terms of a single future, where you expect to die.^[1] In this frame, if you observe yourself surviving, it may seem^[2] you should update your logical belief that alignment is hard (because P(continued observation|alignment being hard) is low, if we imagine a single future, but certain if we imagine the space of indexically possible futures).

Whereas I read it as only indexical, and am generally thinking about this in terms of indexical probabilities.

I totally agree that we shouldn't update our logical beliefs in this way. I.e., that with regard to beliefs about logical probabilities (such as 'alignment is very hard for humans'), we "shouldn't condition on solving alignment, because we haven't yet." I.e that we shouldn't condition on the future not being mostly death outcomes when we haven't averted them and have reason to think they are.

Maybe this helps clarify my position?

On another point:

the developments in non-agentic AI we're facing are still one regime change away from the dynamics that could kill us

I agree with this, and I still found the current lack of goals over the world surprising and worth trying to get as a trait of superintelligent systems.

^{^}
(I'm not disagreeing with this being the most common outcome)
^{^}
Though after reflecting on it more I (with low confidence) think this is wrong, and one's logical probabilities shouldn't change after surviving in a 'one-world frame' universe either.
For an intuition pump: consider the case where you've crafted a device which, when activated, leverages quantum randomness to kill you with probability n-1/n where n is some arbitrarily large number. Given you've crafted it correctly, you make no update in the manyworlds frame because survival is the only thing you will observe; you expect to observe the 1/n branch.
In the 'single world' frame, continued survival isn't garunteed, but it's still the only thing you could possibly observe, so it intuitively feels like the same reasoning applies...?

quila's Shortform

quila5d10

It sounds like you're anthropic updating on the fact that we'll exist in the future

The quote you replied to was meant to be about the past.^[1]

I can see why it looks like I'm updating on existing in the future, though.^[2] I think it may be more interpretable when framed as choosing actions based on what kinds of paths into the future are more likely, which I think should include assessing where our observations so far would fall.

Specifically, I think that ("we find a general agent-alignment solution right as takeoff is very near" given "early AGIs take a form that was unexpected") is less probable than ("observing early AGI's causes us to form new insights that lead to a different class of solution" given "early AGIs take a form that was unexpected"). Because I think that, and because I think we're at that point where takeoff is near, it seems like it's some evidence for being on that second path.

This should only constitute an anthropic update to the extent you think more-agentic architectures would have already killed us

I do think that. I think that superintelligence is possible to create with much less compute than is being used for SOTA LLMs. Here's a thread with some general arguments for this.

Of course, you could claim that our understanding of the past is not perfect, and thus should still update

I think my understanding of why we've survived so far re:AI is very not perfect. For example, I don't know what would have needed to happen for training setups which would have produced agentic superintelligence by now to be found first, or (framed inversely) how lucky we needed to be to survive this far.

~~~

I'm not sure if this reply will address the disagreement, or if it will still seem from your pov that I'm making some logical mistake. I'm not actually fully sure what the disagreement is. You're welcome to try to help me understand if one remains.

I'm sorry if any part of this response is confusing, I'm still learning to write clearly.

^{^}
I originally thought you were asking why it's true of the past, but then I realized we very probably agreed (in principle) in that case.
^{^}
And to an extent it internally feels like I'm doing this, and then asking "what do my actions need to be to make this be true" in a similar sense to how an FDT agent would act in transparent newcombs. But framing it like this is probably unnecessarily confusing and I feel confused about this description.

quila's Shortform

quila5d10

(I think I misinterpreted your question and started drafting another response, will reply to relevant portions of this reply there)

LW Frontpage Experiments! (aka "Take the wheel, Shoggoth!")

quila5d10

Suggestion: A marker for recommended posts which are over x duration old. I was just reading this post which was recommended to me, and got half-way through before seeing it was 2 years out of date :(

https://www.lesswrong.com/posts/3S4nyoNEEuvNsbXt8/common-misconceptions-about-openai

(Or maybe it's unnecessary and I'll get used to checking post dates on the algorithmic frontpage)

quila's Shortform

quila5d42

At what point should I post content as top-level posts rather than shortforms?

For example, a recent writing I posted to shortform was ~250 concise words plus an image: 'Anthropics may support a 'non-agentic superintelligence' agenda'. It would be a top-level post on my blog if I had one set up (maybe soon :p).

Some general guidelines on this would be helpful.