soth02

AGI and the EMH: markets are not expecting aligned or unaligned AI in the next 30 years

Coincidentally, that scene in The Big Short takes place on January 11 (2007) :D

Replying toThat one apocalyptic nuclear famine paper is bunk

That one apocalyptic nuclear famine paper is bunk

I read it as a joke, lol.

https://www.lesswrong.com/posts/jnyTqPRHwcieAXgrA/finding-goals-in-the-world-model

Could it be possible to poison the world model an AGI is based on to cripple its power?

Use generated text/data to train world models based on faulty science like miasma, phlogiston, ether, etc.

Remove all references to the internet or connectivity based technology.

Create a new programming language that has zero real world adoption, and use that for all code based data in the training set.

There might be a way to elicit how aligned/unaligned the putative AGI is.

Enter into a Prisoner's Dilemma type scenario with the putative AGI.
Start off in the non-Nash equilibrium of cooperate/cooperate.
The number of rounds is specified at random and isn't known to participants. (possible variant is declare false last rounds, and then continue playing for x rounds).
Observe when/if the putative AGI defects in the 'last' round.

Does there have to be a reward? This is using brute force to create the underlying world model. It's just adjusting weights right?

Brute force alignment by adding billions of tokens of object level examples of love, kindness, etc to the dataset. Have the majority of humanity contribute essays, comments, and (later) video.

I wonder what kind of signatures a civilization gives off when AGI is nascent.

Replying toGato as the Dawn of Early AGI

Gato as the Dawn of Early AGI

Develop a training set for alignment via brute force. We can't defer alignment to the ubernerds. If enough ordinary people (millions? tens of millions?) contribute billions or trillions of tokens, maybe we can increase the chance of alignment. It's almost like we need to offer prayers of kindness and love to the future AGI: writing alignment essays of kindness that are posted to reddit, or videos extolling the virtue of love that are uploaded to youtube.

Replying to[$20K in Prizes] AI Safety Arguments Competition