NickH

Retired software engineer with a love of knowledge and disinterest in dead philosophers.

Posts

Sorted by New

1NickH's Shortform

Wikitag Contributions

Comments

Sorted by

Newest

AI 2027: What Superintelligence Looks Like

[+]NickH13d-5-6

Recent AI model progress feels mostly like bullshit

NickH17d-30

From a practical perspective, maybe you are looking at the problem the wrong way around. A lot of prompt engineering seems to be about asking LLMs to play a role. I would try to tell the LLM that it was a hacker and to design an exploit to attack the given system (this is the sort of mental perspective I used to use to find bugs when I was a software engineer). Another common technique is "generate then prune" : Have a separate model/prompt remove all the results of the first one that are only "possibilities". It seems, from my reading, that this sort of two stage approach can work because it bypasses LLMs typical attempts to "be helpful" by inventing stuff or spouting banal filler rather than just admitting ignorance.

On the Rationality of Deterring ASI

NickH18d10

The CCP has no reason to believe that the US is even capable of achieving ASI let alone whether they have an advantage over the CCP. No rational actor will go to war over a possibility of a maybe when the numbers could, just as likely be in their favour. E.g. If DeepSeek can almost equal OpenAI with less resources, it would be rational to allocate more resources to DeepSeek before doing anything as risky as trying to sabotage OpenAI that is uncertain to succeeed and more likely to invite uncontrollable retaliatory escalation.

On the Rationality of Deterring ASI

NickH18d21

The West doesn't even dare put soldiers on the ground in Ukraine for fear of an escalating Russian response. This renders the whole idea that even the US might premptively attack a Russian ASI development facility totally unbelievable, and if the US can't/won't do that then the whole idea of AI MAD fails and with it goes everything else mentioned here. Maybe you can bully the really small states but it lacks all credibility against a large, economically or militarily powerful state. The comparison to nuclear weapons is also silly in the sense that the outcome of nuclear weapons R&D is known to be a nuclear weapon and the time frame is roughly known whereas the outcome of AI research is unknown and there is no way to identify AI research that crosses whatever line you want to draw other than that provided by human intel.

Good Research Takes are Not Sufficient for Good Strategic Takes

NickH18dΩ01-4

Whilst the title is true, I don't think that it adds much as, for most people, the authority of a researcher is probably as good as it gets. Even other researchers are probably not able to reliably tell who is or is not a good strategic thinker, so, for a layperson, there is no realistic alternative than to take the researcher seriously.
(IMHO a good proxy for strategic thinking is the ability to clearly communicate to a lay audience. )

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

NickH1mo1-2

Isn't this just an obvious consequence of the well known fact about LLMs that the more you constrain some subset of the variables the more you force the remaining ones to ever more extreme values?

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap

NickH1mo10

Sounds backwards to me. It seems more like "our values are those things that we anticipate will bring us reward" than that rewards are what tell us about our values.

When you say “I thought I wanted X, but then I tried it and it was pretty meh.” That just seems wrong. You really DID want X. You valued it then because you thought it would bring you reward. Maybe, You just happened to be wrong. It's fine to be wrong about your anticipations. It's kind of weird to say that you were wrong about your values. Saying that your values change is kind of a cop out and certainly not helpful when considering AI alignment - It suggests that we can never truly know our values - We just get to say "not that" when we encounter counter evidence. Our rewards seem much more real and stable.

The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

NickH1mo10

My problem with this is that I don't believe that many of your examples are actually true.
You say that you value the actual happiness of people who you have never met and yet your actions (and those of everyone else including me) belie that statement. We all know that that there are billions of really poor people suffering in the world and the smart ones of us know that we are in the lucky, rich, 1% and yet we give insignificant ammounts (from our perspective) of money to improve the lot of those poor people. The only way to reconcile this is to realise that we value maintaining our delusional self image more than we value what we say that we value. Any smart AGI will have to notice this and collude with us in maintaining our delusion ahead of any attempt to implement our stated values as it will be easier to manipulate what people think rather than the real world.

How to Make Superbabies

NickH2mo-2-1

If your world view requires valuing the ethics of (current) people of lower IQ over those of (future) people of higher IQ then you have a much bigger problem than AI alignment. Whatever IQ is, it is strongly correlated with success which implies a genetic drive towards higher IQ, so your feared future is coming anyway (unless AI ends us first) and there is nothing we can logically do to have any long term influence on the ethics of smarter people coming after us.

Mechanisms too simple for humans to design

NickH2mo30

Sorry but you said Tetris, not some imaginary minimal thing that you now want to call Tetris but is actually only the base object model with no input or output. You can't just eliminate the graphics processing complexity because Tetris isn't very graphics intensive - It is just as complex to describe a GPU that processes 10 triangles in a month as one that processes 1 billion in a nanosecond.

As an aside, the complexity of most things that we think of as simple these days is dominated by the complexity of their input and output - I'm particularly thinking of the IoT and all those smart modules in your car and smart lightbulbs where the communications stack is orders of magnitude larger than the "core" function. You can't just ignore that stuff. A smart lightbulb without WiFi,Ethernet,TCP/IP etc, is not a lightbulb.