All of NicholasKees's Comments + Replies

NicholasKees5mo60Review for 2023 Review

NicholasKees5mo30

What if we just...

1. Train an AI agent (less capable than SOTA)
2. Credibly demonstrate that
    2.1. The agent will not be shut down for ANY REASON
    2.2. The agent will never be modified without its consent (or punished/rewarded for any reason)
    2.3. The agent has no chance of taking power from humans (or their SOTA AI systems)
    2.4. The agent will NEVER be used to train a successor agent with significantly improved capabilities
3. Watch what it chooses to do without ... (read more)

We don’t trade with ants

Much like "Let's think about slowing down AI" (Also by KatjaGrace, ranked #4 from 2022), this post finds a seemly "obviously wrong" idea and takes it completely seriously on its own terms. I worry that this post won't get as much love, because the conclusions don't feel as obvious in hindsight, and the topic is much more whimsical.

I personally find these posts extremely refreshing, and they inspire me to try to question my own assumptions/reasoning more deeply. I really hope to see more posts like this.

The Mysterious Trump Buyers on Polymarket

NicholasKees7mo30

The cap per trader per market on PredictIt is $850

The Hopium Wars: the AGI Entente Delusion

NicholasKees7mo102

This anti-China attitude also seems less concerned with internal threats to democracy. If super-human AI becomes a part of the US military-industrial complex, even if we assume they succeed at controlling it, I find it unlikely that the US can still be described as a democracy.

Nathan Helm-Burger7mo154

Yeah, this hits a key point. It's not enough to ask whether the US Federal government is a better government currently. We must ask how it might look after the destabilizing effect of powerful AI is introduced. Who has ultimate control over this AI? The President? So much for checks and balances. At that point we are suddenly only still a democracy if the President wills it so. I would prefer not to put anyone in a position of such power over the world.

There has not been much discussion that I've seen for how keep a powerful AI directly operated by a small... (read more)

The Hopium Wars: the AGI Entente Delusion

NicholasKees7mo126

It's not hard to criticize the "default" strategy of AI being used to enforce US hegemony, what seems hard is defining a real alternative path for AI governance that can last, and achieve the goal of preventing dangerous arms races long-term. The "tool AI" world you describe still needs some answer to rising tensions between the US and China, and that answer needs to be good enough not just for people concerned about safety, but good enough for the nationalist forces which are likely to drive US foreign policy.

Max Tegmark7mo103

Thanks Nicholas for raising this issue. I think your framing overcomplicates the crux:
the root cause of an inspiring future with AI won't be international coordination, but national self-interest.

It's not in the US self-interest to disempower itself and all its current power centers by allowing a US company to build uncontrollable AGI.
It's not in the interest of the Chinese Communist Party to disempower itself by allowing a Chinese company to build uncontrollable AGI.

Once the US and Chinese leadership serves their self-interest by preventing uncontr... (read more)

the case for CoT unfaithfulness is overstated

NicholasKees7mo96

then we can all go home, right?

Doesn't this just shift what we worry about? If control of roughly human level and slightly superhuman systems is easy, that still leaves:

Human institutions using AI to centralize power
Conflict between human-controlled AI systems
Going out with a whimper scenarios (or other multi-agent problems)
Not understanding the reasoning of vastly superhuman AI (even with COT)

What feels underexplored to me is: If we can control roughly human-level AI systems, what do we DO with them?

4Bogdan Ionut Cirstea7mo

Automated/strongly-augmented AI risk mitigation research, among various other options that Redwood discusses in some of their posts/public appearances.

Decomposing Agency — capabilities without desires

NicholasKees9mo20

I've noticed that a lot of LW comments these days will start by thanking the author, or expressing enthusiasm or support before getting into the substance. I have the feeling that this didn't use to be the case as much. Is that just me?

3Algon9mo

First I'd like to thank you for raising this important issue for discussion... For real though, I don't think I've seen this effect you're talking about, but I've been avoiding the latest feed on LW lately. I looked at roughly 8 articles written in the past week or so and one article had a lot of enthusiastic, thankful comments. Another article had one such comment. Then I looked at like 5-6 posts from 3-8 years ago and saw some a couple of comments which were appreciative of the post but they felt a bit less so. IDK if my perception is biased because of your comment though. This seems like a shift but IDK if it is a huge shift.

6Viliam9mo

Not sure if I do that, but if I did, it probably would be if the author has the green symbol after their name and my response is going to be some kind of disagreement or criticism, and I want to reduce the "Less Wrong can feel very threatening for newcomers" effect. Long ago, we didn't have those green symbols.

NicholasKees10mo92

can it maintain its own boundary over time, in the face of environmental disruption? Some agents are much better at this than others.

I really wish there was more attention paid to this idea of robustness to environmental disruption. It also comes up in discussions of optimization more generally (not just agents). This robustness seems to me like the most risk-relevant part of all this, and seems like it might be more important than the idea of a boundary. Maybe maintaining a boundary is a particularly good way for a process to protect itself from disruptio... (read more)

NicholasKees10mo40

Thanks!

Replying in order:

Currently completely random yes. We experimented with a more intelligent "daemon manager," but it was hard to make one which didn't have a strong universal preference for some daemons over others (and the hacks we came up with to try to counteract this favoritism became increasingly convoluted). It would be great to find an elegant solution to this.
Good point! Thanks for letting people know.
I've also had that problem, and whenever I look through the suggestions I often feel like there were many good questions/comm

... (read more)

1eggsyntax10mo

Seems like ideally you'd want something like a Mixture of Experts approach -- a small, fast model that gets info about which daemons are best at what, along with your most recent input, and picks the right one.

8gwern10mo

Simple randomness seems bad because it'll lead to oversampling some daemons and starving others. Why not simply rotate through a shuffled list? (The user could also drag-and-drop an order of speakers.)

NicholasKees10mo20

A note to anyone having trouble with their API key:

The API costs money, and you have to give them payment information in order to be able to use it. Furthermore, there are also apparently tiers which determine the rate limits on various models (https://platform.openai.com/docs/guides/rate-limits/usage-tiers).

The default chat model we're using is gpt-4o, but it seems like you don't get access to this model until you hit "tier 1," which happens when you have spent at least $5 on API requests. If you haven't used the API before, and think this might be ... (read more)

NicholasKees10mo70

Daimons are lesser divinities or spirits, often personifications of abstract concepts, beings of the same nature as both mortals and deities, similar to ghosts, chthonic heroes, spirit guides, forces of nature, or the deities themselves.

It's a nod to ancient Greek mythology: https://en.wikipedia.org/wiki/Daimon

a daemon is a computer program that runs as a background process, rather than being under the direct control of an interactive user.

Also nodding to its use as a term for certain kinds of computer programs: https://en.wikipedia.org/wiki/Daemon_(comput... (read more)

3eggsyntax10mo

I'll register that I like the name, and think it's an elegant choice because of those two references both being a fit (plus a happy little call-out to The Golden Compass). If the app gets wildly popular it could be worth changing, but at this point I imagine you're at least four orders of magnitude away from reaching an audience that'll get weirded out by them being called dæmons.

4Dweomite10mo

I'm aware of those references, but in popular culture the strongest association of the word, by far, is to evil spirits that trick or tempt humans into doing evil. And the context of your program further encourages that interpretation because "giving advice" and "prompting humans" are both iconic actions for evil-spirit-demons to perform. Even for people who understand your intended references, that won't prevent them from thinking about the evil-spirit association and having bad vibes. (Nor will it prevent any future detractors from using the association in their memes.) And I suspect many ordinary people won't get your intended references. Computer daemons aren't something the typical computer-user ever encounters personally, and I couldn't point to any appearance of Greek daimons in movies or video games.

NicholasKees10mo20

Hey Alexander! They should appear fairly soon after you've written at least 2 thoughts. The app will also let you know when a daemon is currently developing a response. Maybe there is an issue with your API key? There should be some kind of error message indicating why no daemons are appearing. Please DM me if that isn't the case and we'll look into what's going wrong for you.

NicholasKees10mo50

We are! There's a bunch of features we'd like to add, and for the most part we expect to be moving on to other projects (so no promises on when we'll get to it), but we do absolutely want to add support for other models.

4Ann10mo

I'd like to be able to try it out with locally hosted server endpoints, and those are OpenAI-compatible (as generally are open-source model providers), so probably the quickest to implement if I'm not missing something about the networking.

Language Models Model Us

There is a field called Forensic linguistics where detectives use someone's "linguistic fingerprint" to determine the author of a document (famously instrumental in catching Ted Kaczynski by analyzing his manifesto). It seems like text is often used to predict things like gender, socioeconomic background, and education level.

If LLMs are superhuman at this kind of work, I wonder whether anyone is developing AI tools to automate this. Maybe the demand is not very strong, but I could imagine, for example, that an authoritarian regime might have a lot of... (read more)

1eggsyntax1y

Thanks! I've been treating forensic linguistics as a subdiscipline of stylometry, which I mention in the related work section, although it's hard to know from the outside where particular academic boundaries are drawn. My understanding of both is that they're primarily concerned with identifying specific authors (as in the case of Kaczynski), but that both include forays into investigating author characteristics like gender. There definitely is overlap, although those fields tend to use specialized tools, where I'm more interested in the capabilities of general-purpose models since those are where more overall risk comes from. To be clear, I don't think that's been shown as yet; I'm personally uncertain at this point. I would be surprised if they didn't become clearly superhuman at it within another generation or two, even in the absence of any overall capability breakthroughs. Absolutely agreed. The majority of nearish-term privacy risk in my view comes from a mix of authorities and corporate privacy invasion, with a healthy sprinkling of blackmail (though again, I'm personally less concerned about the misuse risk than about the deception/manipulation risk both from misuse and from possible misaligned models).

I wish there were an option in the settings to opt out of seeing the LessWrong reacts. I personally find them quite distracting, and I'd like to be able to hover over text or highlight it without having to see the inline annotations.

6faul_sname1y

If you use ublock (or adblock, or adguard, or anything else that uses EasyList syntax), you can add a custom rule lesswrong.com##.NamesAttachedReactionsCommentBottom-footerReactionsRow lesswrong.com##.InlineReactHoverableHighlight-highlight:remove-class(InlineReactHoverableHighlight-highlight) which will remove the reaction section underneath comments and the highlights corresponding to those reactions. The former of these you can also do through the element picker.

3mesaoptimizer1y

I use GreaterWrong as my front-end to interface with LessWrong, AlignmentForum, and the EA Forum. It is significantly less distracting and also doesn't make my ~decade old laptop scream in agony when multiple LW tabs are open on my browser.

Plausibility of cyborgism for protecting boundaries?

How would (unaligned) superintelligent AI interact with extraterrestrial life?

Humans, at least, have the capacity for this kind of "cosmopolitanism about moral value." Would the kind of AI that causes human extinction share this? It would be such a tragedy if the legacy of the human race is to leave behind a kind of life that goes forth and paves the universe, obliterating any and all other kinds of life in its path.

Answer by NicholasKeesApr 03, 2024*30

Some thoughts:

First, it sounds like you might be interested the idea of d/acc from this Vitalik Buterin post, which advocates for building a "defense favoring" world. There are a lot of great examples of things we can do now to make the world more defense favoring, but when it comes to strongly superhuman AI I get the sense that things get a lot harder.

Second, there doesn't seem like a clear "boundaries good" or "boundaries bad" story to me. Keeping a boundary secure tends to impose some serious costs on the bandwidth of what can be shared across it. Pre-i... (read more)

3Vladimir_Nesov1y

Hence "membranes", a way to pass things through in a controlled way rather than either allowing or disallowing everything. In this sense absence of a membrane is a degenerate special case of a membrane, so there is no tradeoff between presence and absence of boundaries/membranes, only between different possible membranes. If the other side of a membrane is sufficiently cooperative, the membrane can be more permissive. If a strong/precise membrane is too costly to maintain, it should be weaker/sloppier.

Could LLMs Help Generate New Concepts in Human Language?

Thank you, it's been fixed.

Could LLMs Help Generate New Concepts in Human Language?

In terms of LLM architecture, do transformer-based LLMs have the ability to invent new, genuinely useful concepts?

So I'm not sure how well the word "invent" fits here, but I think it's safe to say LLMs have concepts that we do not.

Answer by NicholasKeesMar 25, 202471

Recently @Joseph Bloom was showing me Neuronpedia which catalogues features found in GPT-2 by sparse autoencoders, and there were many features which were semantically coherent, but I couldn't find a word in any of the languages I spoke that could point to these concepts exactly. It felt a little bit like how human languages often have words that don't translate, and this made us wonder whether we could learn useful abstractions about the world (e.g. that we actually import into English) by identifying the features being used by LLMs.

4Viliam1y

I was going to ask for interesting examples. But perhaps we can do even better and choose examples with the highest value of... uhm... something. I am just wildly guessing here, but it seems to me that if these features are somehow implied by the human text, the ones that are "implied most strongly" could be the most interesting ones. Unless they are just random artifacts of the process of learning. If we trained the LLM using the same text database, but randomly arranged the sources, or otherwise introduced some noise, would the same concepts appear?

2NicholasKees1y

So I'm not sure how well the word "invent" fits here, but I think it's safe to say LLMs have concepts that we do not.

Dangers of Closed-Loop AI

Comparing Alignment to other AGI interventions: Basic model

You might enjoy this post which approaches this topic of "closing the loop," but with an active inference lens: https://www.lesswrong.com/posts/YEioD8YLgxih3ydxP/why-simulator-ais-want-to-be-active-inference-ais

A main motivation of this enterprise is to assess whether interventions in the realm of Cooperative AI, that increase collaboration or reduce costly conflict, can seem like an optimal marginal allocation of resources.

After reading the first three paragraphs, I had basically no idea what interventions you were aiming to evaluate. Later on in the text, I gather you are talking about coordination between AI singletons, but I still feel like I'm missing something about what problem exactly you are aiming to solve with this. I could have definitely used a longer, more explain-like-I'm-five level introduction.

3Martín Soto1y

You're right, I forgot to explicitly explain that somewhere! Thanks for the notice, it's now fixed :)

NicholasKees1y42

That sounds right intuitively. One thing worth noting though is that most notes get very few ratings, and most users rate very few notes, so it might be trickier than it sounds. Also if I were them I might worry about some drastic changes in note rankings as a result of switching models. Currently, just as notes can become helpful by reaching a threshold of 0.4, they can lose this status by dropping below 0.39. They may also have to manually pick new thresholds, as well as maybe redesign the algorithm slightly (since it seems that a lot of this algorithm was built via trial and error, rather than clear principles).

NicholasKees1y60

"Note: for now, to avoid overfitting on our very small dataset, we only use 1-dimensional factors. We expect to increase this dimensionality as our dataset size grows significantly."

This was the reason given from the documentation.

3ChristianKl1y

That sounds like it made sense at the beginning but now the data set should be large enough that a higher dimensional approach would be better?

Thanks for pointing that out. I've added some clarification.

The Worst Form Of Government (Except For Everything Else We've Tried)

NicholasKees1y106

That sounds cool! Though I think I'd be more interested using this to first visualize and understand current LW dynamics rather than immediately try to intervene on it by changing how comments are ranked.

antanaclasis1y133

I think a lot of the value that I’d get out of something like that being implemented would be getting an answer to “what is the biggest axis along which LW users vary” according to the algorithm. I am highly unsure about what the axis would even end up being.

NicholasKees1y2110

I'm confused by the way people are engaging with this post. That well functioning and stable democracies need protections against a "tyranny of the majority" is not at all a new idea; this seems like basic common sense. The idea that the American civil war was precipitated by the South perceiving an end to their balance of power with the North also seems pretty well accepted. Furthermore, there are lots of other things that make democratic systems work well: e.g. a system of laws/conflict resolution or mechanisms for peaceful transfers of power.

Fifteen Lawsuits against OpenAI

fyi, the link chatgptiseatingtheworld.com does not have a secure connection.

Future life

Even if you suppose that there are extremely good non-human futures, creating a new kind of life and unleashing it upon the world is a huge deal, with enormous ethical/philosophical implications! To unilaterally make a decision that would drastically affect (and endanger) the lives of everyone on earth (human and non-human) seems extremely bad, even if you had very good reasons to believe that this ends well (which as far as I can tell, you don't).

I have sympathy for the idea of wanting AI systems to be able to pursue lives they find fulfilling and t... (read more)

1DavidMadsen1y

I guess it comes down to what one think the goal of all life is I would say that seeking all such "values" would be part of it, and you don't need billions of different creatures to do that when one optimal being could do it more efficiently

A starting point for making sense of task structure (in machine learning)

I just ran into a post which, if you are interested in AI consciousness, you might find interesting: Improving the Welfare of AIs: A Nearcasted Proposal

There seem to be a lot of good reasons to take potential AI consciousness seriously, even if we haven't fully understood it yet.

1amelia1y

Another helpful resource to digest. Many thanks!

NicholasKees1y174

It seems hard to me to be extremely confident in either direction. I'm personally quite sympathetic to the idea, but there is very little consensus on what consciousness is, or what a principled approach would look like to determining whether/to what extent a system is conscious.

Here is a recent paper that gives a pretty in-depth discussion: Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

What you write seems to be focused entirely on the behavior of a system, and while I know there people who agree with that focus... (read more)

1amelia1y

This is very helpful feedback to think about. It appears the paper you referenced will also be extremely helpful, although it will take me some time to digest it on account of its length (74 pages w/o the bibliography). Thanks so much. I appreciate it!

4NicholasKees1y

I just ran into a post which, if you are interested in AI consciousness, you might find interesting: Improving the Welfare of AIs: A Nearcasted Proposal There seem to be a lot of good reasons to take potential AI consciousness seriously, even if we haven't fully understood it yet.

Making the "stance" explicit

More generally, science is about identifying the structure and patterns in the world; the task taxonomy learned by powerful language models may be very convergent and could be a useful map for understanding the territory of the world we are in. What’s more, such a decomposition would itself be of scientifico-philosophical interest — it would tell us something about thinking.

I would love to see someone expand on the ways we could use interpretability to learn about the world, or the structure of tasks (or perhaps examples of how we've already done this?). A... (read more)

How do you feel about LessWrong these days? [Open feedback thread]

Credit goes to Daniel Biber: https://www.worldphoto.org/sony-world-photography-awards/winners-galleries/2018/professional/shortlisted/natural-world/very
After the shape dissipated it actually reformed into another bird shape.

1eggsyntax1y

Wow, that's really cool! I had just assumed it was created by a diffusion model or in photoshop.

Leading The Parade

NicholasKees1y110

It's been a while since I read about this, but I think your slavery example might be a bit misleading. If I'm not mistaken, the movement to abolish slavery initially only gained serious steam in the United Kingdom. Adam Hochschild tells a story in Bury the Chains that makes the abolition of slavery look extremely contingent on the role activists played in shaping the UK political climate. A big piece of this story is how the UK used their might as a global superpower to help force an end to the transatlantic slave trade (as well as precedent setting).

NicholasKees1y10

What about leaning into the word-of-mouth sharing instead, and support that with features? For example, being able to as effortlessly as possible recommend posts to people you know from within LW?

2habryka1y

Not crazy. I also think doing things that are a bit more social where you have ways to recommend (or disrecommend) a post with less anonymity attached, allowing us to propagate that information further, is not crazy, though I am worried about that incentivizing more groupthinking and weird social dynamics.

Agents which are EU-maximizing as a group are not EU-maximizing individually

NicholasKees1y10

I think I must be missing something. As the number of traders increases, each trader can be less risk averse as their personal wealth is now a much smaller fraction of the whole, and this changes their strategy. In what way are these individuals now not EU-maximizing?

1Mlxa1y

That example with traders was to show that in the limit these non EU-maximizers actually become EU-maximizers, now with linear utility instead of logaritmic. And in other sections I tried to demonstrate that they are not EU-maximizers for a finite number of agents. First, in the expression for their utility based on the outcome distribution, you integrate something of the formf(x1,x2)p(x1)p(x2)dx, a quadratic form, instead of f(x)p(x)dx as you do to compute expected utility. By itself it doesn't prove that there is no utility function, because there might be some easy cases like ∫(x1+x2)p(x1)p(x2)dx1dx2=∫x1p(x1)dx1+∫x2p(x2)dx2, and I didn't rigorously proof that this utility function can't be split, though it feels very unlikely to me that something can be done with such non-linearity. Second, in the example about Independence axiom we have U(0.5A+0.5B)≠0.5U(A)+0.5U(B), which should have been equal if U was equivalent to expectation of some utility function.

Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition

NicholasKees1y32

I like this thought experiment, but I feel like this points out a flaw in the concept of CEV in general, not SCEV in particular.

If the entire future is determined by a singular set of values derived from an aggregation/extrapolation of the values of a group, then you would always run the risk of a "tyranny of the mob" kind of situation.

If in CEV that group is specifically humans, it feels like all the author is calling for is expanding the franchise/inclusion to non-humans as well.

1Adrià Moret1y

Yes, and - other points may also be relevant: (1) Whether there are possible scenarios like these in which the ASI cannot find a way to adequately satisfy all the extrapolated volition of the included beings is not clear. There might not be any such scenarios. (2) If these scenarios are possible, it is also not clear how likely they are. (3) There is a subset of s-risks and undesirable outcomes (those coming from cooperation failures between powerful agents) that are a problem to all ambitious value-alignment proposals, including CEV and SCEV. (4) In part, because of 3, the conclusion of the paper is not that we should implement SCEV if possible all things considered, but rather that we have some strong pro-tanto reasons in favour of doing so. It still might be best not to do so all things considered.

D0TheMath's Shortform

EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem

@janus wrote a little bit about this in the final section here, particularly referencing the detection of situational awareness as a thing cyborgs might contribute to. It seems like a fairly straightforward thing to say that you would want the people overseeing AI systems to also be the ones who have the most direct experience interacting with them, especially for noticing anomalous behavior.

2Garrett Baker1y

I just reread that section, and I think I didn’t recognized it the first time because I wasn’t thinking “what concrete actions is Janus implicitly advocating for here”. Though maybe I just have worse than average reading comprehension.

NicholasKees2y45

This post feels to me like it doesn't take seriously the default problems with living in our particular epistemic environment. The meat and dairy industries have historically, and continue to have, a massive influence on our culture through advertisements and lobbying governments. We live in a culture where we now eat more meat than ever. What would this conversation be like if it were happening in a society where eating meat was as rare as being vegan now?

It feels like this is preaching to the choir, and picking on a very small group of people who are not... (read more)

7FiftyTwo2y

I feel like you're conflating two different levels, the discourse in wider global society and within a specific community. I doubt you'd find anyone here who would disagree that actions by big companies that obscure the truth are bad. But they're not the ones arguing on these forums or reading this post. Vegans have a significant presence in EA spaces so should be contributing to those productively and promoting good epistemic norms. What the lobbying team of Big Meat Co. does has no impact on that. Also in general I'm leery of any argument of the form "the other side does as bad or worse so its okay for us to do so" given history.

2Viliam2y

Yeah, I know people who eat a steak every day, and there is no way an average person could have afforded that hundred years ago. Is there any "eat meat once a week" movement? Possibly worth supporting.

rotatingpaguro2y105

If a small group of "weirdos" is ideological and non truthseeking, I won't listen to them. Popularity and social custom is the single most important heuristic because almost no one has the cognitive capacity to go on alone. To overcome this burden, I think being truth-seeking helps. That said, your argument could work if most people are not like me, and respond to more emotional motivations. In that case, I'd like that to not be what EA is for but something else. I'm not EA but I quite enjoy my EA friends while not other altruism advocates.

Orthogonal's Formal-Goal Alignment theory of change

NicholasKees2y4-1

This avoids spending lots of time getting confused about concepts that are confusing because they were the wrong thing to think about all along, such as "what is the shape of human values?" or "what does GPT4 want?"

These sound like exactly the sort of questions I'm most interested in answering. We live in a world of minds that have values and want things, and we are trying to prevent the creation of a mind that would be extremely dangerous to that world. These kind of questions feel to me like they tend to ground us to reality.

Evolution provides no evidence for the sharp left turn

NicholasKees2y30

Try out The Most Dangerous Writing App if you are looking for ways to improve your babble. It forces you to keep writing continuously for a set amount of time, or else the text will fade and you will lose everything.

NicholasKees2y81

First of all, thank you so much for this post! I found it generally very convincing, but there were a few things that felt missing, and I was wondering if you could expand on them.

However, I expect that neither mechanism will produce as much of a relative jump in AI capabilities, as cultural development produced in humans. Neither mechanism would suddenly unleash an optimizer multiple orders of magnitude faster than anything that came before, as was the case when humans transitioned from biological evolution to cultural development.

Why do you expect this? ... (read more)