All of NickH's Comments + Replies

From a practical perspective, maybe you are looking at the problem the wrong way around. A lot of prompt engineering seems to be about asking LLMs to play a role. I would try to tell the LLM that it was a hacker and to design an exploit to attack the given system (this is the sort of mental perspective I used to use to find bugs when I was a software engineer). Another common technique is "generate then prune" : Have a separate model/prompt remove all the results of the first one that are only "possibilities". It seems, from my reading, that this sort of two stage approach can work because it bypasses LLMs typical attempts to "be helpful" by inventing stuff or spouting banal filler rather than just admitting ignorance.

The CCP has no reason to believe that the US is even capable of achieving ASI let alone whether they have an advantage over the CCP. No rational actor will go to war over a possibility of a maybe when the numbers could, just as likely be in their favour. E.g. If DeepSeek can almost equal OpenAI with less resources, it would be rational to allocate more resources to DeepSeek before doing anything as risky as trying to sabotage OpenAI that is uncertain to succeeed and more likely to invite uncontrollable retaliatory escalation.

The West doesn't even dare put soldiers on the ground in Ukraine for fear of an escalating Russian response. This renders the whole idea that even the US might premptively attack a Russian ASI development facility totally unbelievable, and if the US can't/won't do that then the whole idea of AI MAD fails and with it goes everything else mentioned here. Maybe you can bully the really small states but it lacks all credibility against a large, economically or militarily powerful state. The comparison to nuclear weapons is also silly in the sense that the outc... (read more)

NickH
Ω01-4

Whilst the title is true, I don't think that it adds much as, for most people, the authority of a researcher is probably as good as it gets. Even other researchers are probably not able to reliably tell who is or is not a good strategic thinker, so, for a layperson, there is no realistic alternative than to take the researcher seriously.
(IMHO a good proxy for strategic thinking is the ability to clearly communicate to a lay audience. )

3Neel Nanda
I think the correct question is how much of an update should you make in an absolute sense rather than a relative sense? Many people in this community are overconfident and if you decide that every person is less worth listening to than you thought this doesn't change who you listen to, but it should make you a lot more uncertain in your beliefs

Isn't this just an obvious consequence of the well known fact about LLMs that the more you constrain some subset of the variables the more you force the remaining ones to ever more extreme values?

2Owain_Evans
I don't think this explains the difference between the insecure model and the control models (secure and educational secure).

Sounds backwards to me. It seems more like "our values are those things that we anticipate will bring us reward" than that rewards are what tell us about our values. 

When you say “I thought I wanted X, but then I tried it and it was pretty meh.” That just seems wrong. You really DID want X. You valued it then because you thought it would bring you reward.  Maybe, You just happened to be wrong. It's fine to be wrong about your anticipations. It's kind of weird to say that you were wrong about your values. Saying that your values change is kind of ... (read more)

My problem with this is that I don't believe that many of your examples are actually true.
You say that you value the actual happiness of people who you have never met and yet your actions (and those of everyone else including me) belie that statement. We all know that that there are billions of really poor people suffering in the world and the smart ones of us know that we are in the lucky, rich, 1% and yet we give insignificant ammounts (from our perspective) of money to improve the lot of those poor people. The only way to reconcile this is to realise th... (read more)

If your world view requires valuing the ethics of (current) people of lower IQ over those of (future) people of higher IQ then you have a much bigger problem than AI alignment. Whatever IQ is, it is strongly correlated with success which implies a genetic drive towards higher IQ, so your feared future is coming anyway (unless AI ends us first) and there is nothing we can logically do to have any long term influence on the ethics of smarter people coming after us.

Sorry but you said Tetris, not some imaginary minimal thing that you now want to call Tetris but is actually only the base object model with no input or output. You can't just eliminate the graphics processing complexity because Tetris isn't very graphics intensive - It is just as complex to describe a GPU that processes 10 triangles in a month as one that processes 1 billion in a nanosecond. 

As an aside, the complexity of most things that we think of as simple these days is dominated by the complexity of their input and output - I'm particularly thin... (read more)

  1. In research you don't usually know your precise destination. Maybe LA but definitely not the hotel.
  2. research, in general, is about mapping all of California, not just the quickest route between two points and all the friends helped with that.
  3. You say "Alice tackled the central bottleneck" but you don't say what that was, only her "solution". Alice is only key here with the benefit of hindsight. If the I5 didn't exist or was closed for some reason then one of her friends solutions might have been better.

Regarding the bad behavour of governments, especially when and why the victimise their own citizens, I recommend you read The Dictator's Handbook. 
https://amzn.eu/d/40cwwPx

If Neanderthals could have created a well aligned agent,far more powerful than themselves, they would still be around and we, almost certainly, would not.
The mereset possibility of creating super human, self improving, AGI is a total philosophical game changer.
My personal interest is in the interaction between longtermism and the Fermi paradox - Any such AGIs actions are likely to be dominated by the need to prevail over any alien AGI that it ever encounters as such an encounter is almost certain to end one or the other.

Yes. It will prioritise the future over the present.
The utility of all humans being destroyed by an alien AI in the future is 0.
The utility of populating the future light cone is very, very large and most of that utility is in the far future.
Therefore the AI should sacrifice almost everything in the near term light cone to prevent the 0 outcome. If it could digitise all humans or possibly just have a gene bank then it can still fill most of the future light cone with happy humans once all possible threats have red-shifted out of reach. Living humans are small but non-zero risk to the master plan and hence should be dispensed with.

An AI with potentially limitless lifespan will prioritise the future over the present to an extent that would, almost certainly be bad for us now.
For example it may seem optimal to kill off all humans whilst keeping a copy of our genetic code so as to have more compute power and resources available to produce Von Neumann Probes to maximise the region of the universe it controls before encountering, and hopefully destroying, any similar alien AI diaspora. Only after some time, once all possible threats had been eliminated, would it start to recreate humans ... (read more)

It's even worse than that:
1) "we" know that "our" values now are, at least slightly, different to what they were 10,000 years ago.
2) We have no reason to believe that we are currently at a state of peak, absolute values (whatever that might mean) and therefore expect that, absent SGI, our values will be different in 10,000 years.
3) If we turn over power to an SGI, perfectly aligned with our current values then they will be frozen for the rest of time. Alternatively, if we want it to allow our values to change "naturally" over time it will be compelled to d... (read more)

Sorry but you lost me on the second paragraph "For example, the Tetris game fits in a 6.5 kB file, so the Kolmogorov complexity of Tetris is at most 6.5 kB". This is just wrong. The Kolmogorov complexity of Tetris has to include the operating system and hardware that runs the program. The proof is trivial by counterexample - If you were correct I could reduce the complexity to 0B by creating an empty file and  an OS that interprets an empty file as a command to run the Tetris code embedded in the OS

5quiet_NaN
I think formally, the Kolmogorov complexity would have to be stated as the length of a description of a Turing Machine (not that this gets completely rid of any wiggle room). Of course, TMs do not offer a great gaming experience. "The operating system and the hardware" is certainly an upper bound, but also quite certainly to be overkill. Your floating point unit or your network stack are not going to be very busy while you play tetris. If you cut it down to the essentials (getting rid of things like scores which have to displayed as characters, or background graphics or music), you have a 2d grid in which you need to toggle fields, which is isomorphic to a matrix display. I don't think that having access to boost or JCL or the python ecosystem is going to help you much in terms of writing a shorter program than you would need for a bit serial processor. And these things can be crazy small -- this one takes about 200 LUTs and FFs. If we can agree that an universal logic gate is a reasonable primitive which would be understandable to any technological civilization , then we are talking on the order of 1k or 2k logic gates here. Specifying that on a circuit diagram level is not going to set you back by more than 10kB. So while you are technically correct that there is some overhead, I think directionally Malmesbury is correct in that the binary file makes for a reasonable estimate of the information content, while adding the size of the OS (sometimes multiple floppy disks, these days!) will lead to a much worse estimate.

What is the probability that there are not 3^^^3 anti-muggers out there who will kill 3^^^^^^3 people if I submit to the mugger? Not 0.
The original argument against Pascal's Wager does not require you to actually believe in any of the other god's, just that the probability of them existing and having the reverse utility is enough to cancel out the probability of Pascal being right.
 

My counter thought experiment to CEV is to consider our distant ancestors. I mean so far distant that we wouldn't call them human, maybe even as far back as some sort of fish-like creature. Suppose a super AI somehow offered this fish the chance to rapidly "advance", following its CEV and it showed it a vision of the future, us, and asked the fishy thing whether to go ahead. Do you think the fishy thing would say yes?
Similarly, if an AI offered to evolve humankind, in 50 years, into telepathic little green men that it assured us was the result of our CEV, ... (read more)

I like this except for the reference to "Newcomblike" problems, which, I feel, is misleading and obfuscates the whole point of Newcomb's paradox. Newcomb's paradox is about decision theory - If you allow cheating then it is no longer Newcomb's paradox. This article is about psychology (and possibly deceptive AI) - cheating is always a possible solution . 

The words stand for abstractions and abstractions suffer from the abstraction uncertainty principle i.e. an abstraction cannot be simultaneously, very useful/widely applicable and very precise. The more useful a word is, the less precise it will be and vise versa. Dictionary definitions are a compromise - They never use the most precise definitions even when such are available (e.g. for scientific terms) because such definitions are not useful for communication between most users of the dictionary. For example, If we defined red to be light with a frequenc... (read more)

Food costs are not even slightly comparable. When I was kid (in the UK) they ran national advertising campaigns on TV for brands of flour, sugar and sliced bread. Nowadays the only reason these things aren't effectively free is because they take up valuable shelf space. Instead people are buying imported fruit and vegetables and ready-meals. It's like comparing the price of wood in the 1960's to the price of a fitted kitchen today.

5AnthonyC
Came to say pretty much this. Even within product categories, quality is nearly incomparable. In the 1990s if you wanted more than a handful of options for cheese in America you needed to go to specialty stores. Things like organic flour, whole wheat flour, non-wheat flours, more than 2 varieties of apple, were very hard to come by in most places. I could easily spend a quarter as much on groceries as I currently do, if I wanted to eat like my parents ate when I was little. Less than that, if I ate like my grandparents did when they were my age. I don't do that, because the food quality is worth the price to me. In contrast, the kinds of houses they lived in, built, or bought would mostly be illegal today, either not up to code or not in line with zoning ordinances. Ditto for cars and childcare.   Edit to add: It's hard to overstate how true this is. About 10 years ago I talked to a senior manager from General Mills who told me that if they shipped cereal boxes to the stores empty, the price would probably only be a couple of cents lower. For flour it's even more true.

Large groups of people can only live together by forming social hierarchies.
The people at the top of the hierarchy want to maintain their position both for themselves AND for their children (It's a pretty good definition of a good parent).
Fundamentally the problem is that it is not really about resources - It's a zero sum game for status and money is just the main indicator of status in the modern world.
 

The common solution to the problem of first timers is to make the first time explicitly free.
This is also applicable to clubs with fixed buy in costs but unknown (to the newbie) benefits and works well whenever the cost is realtively small (as it should be if it is optional). If they don't like the price they won't come again.

I think we can all agree on the thoughts about conflationary alliances.
On consciousness, I don't see a lot of value here apart from demonstrating the gulf in understanding between different people. The main problem I see, and this is common to most discussions of word definitions, is that only the extremes are considered. In this essay I see several comparisons of people to rocks, which is as extreme as you can get, and a few comparing people to animals, which is slightly less so, but nothing at all about the real fuzzy cases that we need to probe to decid... (read more)

Before we can even start to try to align AIs to human flourishing, we first need a clear definition of what that means. This has been a topic accessible to philosophical thought for millenia and yet still has no, universally accepted definition so how can you consider AI alignment helpful. Even if that we could all agree on what "human flourishing" meant, you would still have the problem of lock-in i.e. our AI overlords will never allow that definition to evolve once they have assumed control. Would you want to be trapped in the Utopia of someone born 3000 years ago? Better than being exterminated but still not what we want.

1ABlue
I think the key to approaches like this is to eschew pre-existing, complex concepts like "human flourishing" and look for a definition of Good Things that is actually amenable to constructing an agent that Does Good Things. There's no guarantee that this would lead anywhere; it relies on some weak form of moral realism. But an AGI that follows some morality-you-largely-agree-with by its very structure is a lot more appealing to me than an AGI that dutifully maximizes the morality-you-punched-into-its-utility-function-at-bootup, appealing enough that I think it's worth wading into moral philosophy to see if the idea pans out. 

As a counterargument, consider mapping our ontology onto that of a baby. We can, kind of, explain some things in baby terms and, to that extent, a baby could theoretically see our neurons mapping to similar concepts in their ontology lighting up when we do or say things related to that ontology. At the same time our true goals are utterly alien to the baby.
Alternatively, imagine that you are sent back to the time of the pharaohs and had a discussion with Cheops/Khufu about the weather and forthcoming harvest - Even trying to explain it in terms of chaos th... (read more)

I've heard much about the problems of misaligned superhuman AI killing us all but the long view seems to imply that even a "well aligned" AI will prioritise inhuman instrumental goals.

2Seth Herd
I'm not quite understanding yet. Are you saying that an immortal AGI will prioritize preparing to fight an alien AGI, to the point that it won't get anything else done? Or what? Immortal expanding AGI is a part of classic alignment thinking, and we do assume it would either go to war or negotiate with an alien AGI if it encounters one, depending on the overlap in their alignment/goals.

Have I missed something or is everyone ignoring the obvious problem with a superhuman AI with potentially limitless lifespan? It seems to me that such an AI, whatever its terminal goals, must, as an instrumental goal, prioritise seeking out and destroying any alien AI because, in simple terms, the greatest threat to it tiling the universe with tiny smiling human faces is an alien AI set on tiling the universe with tiny, smiling alien faces and, in a race for dominance, every second counts.
The usual arguments about logarithmic future discounting do not seem appropriate for an immortal intelligence.

4habryka
This seems like a relatively standard argument, but I also struggle a bit to understand why this is a problem. If the AI is aligned it will indeed try to spread through the universe as quickly as possible, eliminating all competition, but if shares our values, that would be good, not bad (and if we value aliens, which I think I do, then we would presumably still somehow trade with them afterwards from a position of security and stability).
1Neil
I'm not clear on what you're calling the "problem of superhuman AI"?

The whole "utilizing our atoms" argument is unnecessarily extreme. It makes for a much clearer argument and doesn't even require super human intelligence to argue that the paperclip maximiser can obviously make more paperclips if it just takes all the electricity and metal that we humans currently use for other things and uses them to make more paperclips in a totally ordinary paperclip factory. We wouldn't necessarily be dead at that point but we would be as good as dead and have no way to seize back control.

2avturchin
Yes. But also AI will not make actual paperclips for millions and even billions years: it will spend this time for conquering universe in the most effective way. It could use Earth materials for jump start the space exploration as soon as possible. It could preserve some humans as some bargin resource in case it meets other AI in space. 

I'm pretty dissapointed by the state of AI in bridge. IMHO the key milestones for AI would be:
1) Able to read and understand a standard convention card and play with/against that convention.
2) Decide the best existing convention.
3) Invent new, superior conventions. This is where we should be really scared.

"is it better to suffer an hour of torture on your deathbed, or 60 years of unpleasant allergic reaction to common environmental particles?"

This only seems difficult to you because you haven't assigned numbers to the pain of torture or unpleasant reaction. Once you do so (as any AI utility function must) it is just math. You are not really talking about procrastination at all here.

IMHO this is a key area for AI research because people seem to think that making a machine, with potentially infinite lifespan, behave like a human being whose entire existence is built around their finite lifespan, is the way forward. It seems obvious to me that if you gave the most wise, kind and saintly person in the world, infinite power and immortality, their behaviour would very rapidly deviate from any democratic ideal of the rest of humanity. 
When considering time discounting people do not push the idea far enough - They say that we should con... (read more)

My immediate thought was that the problem of the default action is almost certainly just as hard as the problem that you are trying to solve whilst being harder to explain and so I don't believe that this gets us anywhere.

This is confused about who/what the agent is and about assumed goals.
The final question suggests that the agent is gravity. Nobody thinks that the goal/value function of gravity is to make the pinball fall in the hole - At a first approximation, its goal is to have ALL objects fall to earth and we observe it thwarted in that goal almost all the time, the pinball happens to be a rare success.
If we were to suggest that the pinball machine were the agent that might make more sense but then we would say that the pinball machine does not make any decisions and ... (read more)

This is a great article that I would like to see go further with respect to both people and AGI.
With respect to people, it seems to me that, once we assume intent, we build on that error by then assuming the stability of that intent (because peoples intents tend to be fairly stable) which then causes us to feel shock when that intent suddenly changes. We might then see this as intentional deceit and wander ever further from the truth - that it was only an unconscious whim in the first place.
Regarding AGI, this is linked to unwarranted anthropomorpism, agai... (read more)

I don't think this is relevant. It only seems odd if you believe that the job of developers is to please everyone rather than to make money. User Stories are reasonable for the goal of creating software that will make a large proportion of the target market want to buy that software. Numerous studies and real world evidence, show that the top few percent of products capture the vast majority of the market and therefore software companies would be unhappy if their developers did not show a clear bias. There would only be a downside if the market showed the ... (read more)

1CrimsonChin
I think when you say, "I don't think this is relevant" you mean... I agree with your premise (that user stories are related to the assumed intent bias) but I don't think that we should upend user stories yet because they do what they are supposed to. To which I agree. Development is complex and realistically even with user stories, developers are considering other users (not in the narrative). If you were to take away user stories and focus on tasks, developers would still imagine user intent. By using user stories we are just shifting focus on intent. Which I think is usually a net positive. This post helped me illuminate in my head where it might not be a net positive.

Why no discussion of the world view? IMHO AI cannot be paused because, if the US & EU pause AI development to pursue AI safety research then other state actors such as Russia & China will just see this as an opportunity to get ahead with the added benefit that the rest of the world will freely give away their safety research. It's a political no-brainer unless the leaders of those countries have extreme AI safety concerns themselves. Does anyone really believe that the US would either go to war or impose serious economic sanctions on countries that did not pause?

The problem is interesting but you can view it in so many ways, many of which are contradictory.
Everyone applied theory of mind to make assumptions about what was or was not implied about the scope of the problem. The obvious lesson here is that we should never try to apply our own theory of (human) mind to an AI mind but this is too harsh on the participants as the problem solver was definitely not an AI and assumptions about the scope of a question are "usually" justified when answering questions posed by humans, except that sometimes we need to delve de... (read more)