MMath Cambridge. Currently studying postgrad at Edinburgh.
But if the universal failure of nature and man to find non-connectionist forms of general intelligence does not move you
Firstly, AIXI exists, and we agree that it would be very smart if we had the compute to run it.
Secondly I think there is some sort of slight of had here.
ChatGPT isn't yet fully general. Neither is a 3-sat solver. 3-sat looks somewhat like what you might expect a non-connectionist approach to intelligence to look like. There are a huge range of maths problems that are all theoretically equivalent to 3 sat.
In the infinite limit, both types of intelligence can simulate the other at huge overhead, In practice, they can't.
Also, non-connectionist forms of intelligence are hard to evolve, because evolution works in small changes.
why is it obvious the nanobots could pretend to be an animal so well that it's indistinguishable?
These nanobots are in the upper atmosphere, possibly with clouds in the way, and the nanobot fake humans could be any human to nanobot ratio. Nanobot internals except human skin and muscles. Or just a human with a few nanobots in their blood.
Or why would targeted zaps have bad side-effects?
Because nanobots can be like a bacteria if they want. Tiny and everywhere. The nanobots can be hiding under leaves, cloths, skin, roofs etc. And even if they weren't, a single nanobot is a tiny target. Most of the energy of the zap can't hit a single nanobot. Any zap of light that can stop nanobots in your house needs to be powerful enough to burn a hole in your roof.
And even if the zap isn't huge, it's not 1 or 2 zapps, it's loads of zapps constantly.
The "Warring nanobots in the upper atmosphere" thing doesn't actually make sense.
The zaps of light are diffraction limited. And targeting at that distance is hard. Partly because it's hard to tell between an actual animal and a bunch of nanobots pretending to be an animal. So you can't zap the nanobots on the ground without making the ground uninhabitable for humans.
The "California red tape" thing implies some alignment strategy that stuck the AI to obey the law, and didn't go too insanely wrong despite a superintelligence looking for loopholes (Eg the social persuasion infrastructure is already there. Convince humans that dyson sphere are pretty and don't block the view?).
There is also no clear explanation of why someone somewhere doesn't make a non-red-taped AI.
if the computation you are carrying out is such that it needs to determine how to achieve goals regarding the real world anyway (e.g. agentic mask)
As well as agentic masks, there are uses for within network goal directed steps. (Ie like an optimizing compiler. A list of hashed followed by unhashed values isn't particularly agenty. But the network needs to solve an optimization problem to reverse the hashes. Something it can use the goal directed reasoning section to do.
My understanding is that these are explicitly and intentionally trained (wouldn't come to exist naturally under gradient descent on normal training data)
No. Normally trained networks have adversarial examples. A sort of training process is used to find the adversarial examples.
So if the ambient rate of adversarial examples is 10^-9, then every now and then the AI will hit such an example and go wild. If the ambient rate is 10^-500, it won't.
That's a much more complicated goal than the goal of correctly predicting the next token,
Is it more complicated? What ontological framework is this AI using to represent it's goal anyway?
any willingness to sacrifice a few tokens now would be trained out by gradient descent.
Only if, during training, the network repeatedly gets into a state where it believes that sacrificing tokens now is a good idea. Despite the fact that it isn't a good idea when you are in training. (Unless there is a training environment bug and you can sneak out mid way through training)
So, is the network able to tell whether or not it's in training?
Would you expect some part of the net to be left blank, because "a large neural net has a lot of spare neurons"?
If the lottery ticket hypothesis is true, yes.
The lottery ticket hypothesis is that some parts of the network start off doing something somewhat close to useful, and get trained towards usefulness. And some parts start off sufficiently un-useful that they just get trained to get out of the way.
Which fits with neural net distillation being a thing. (Ie training a big network, and then condensing it into a smaller network gives better performance than directly training a small network.
but gradient descent doesn't care, it reaches in and adjusts every weight.
Here is an extreme example. Suppose the current parameters were implementing a computer chip, on which was running a holomorphically encrypted piece of code.
Holomorphic encryption itself is unlikely to form, but it serves at least as an existance proof for computational structures that can't be adjusted with local optimization.
Basically the problem with gradient descent is that it's local. And when the same neurons are doing things that the neural net does want, and things that the neural net doesn't want (but doesn't dis-want either) then its possible for the network to be trapped in a local optimum. Any small change to get rid of the bad behavior would also get rid of the good behavior.
Also, any bad behavior that only very rarely effects the output will produce very small gradients. Neural nets are trained for finite time. It's possible that gradient descent just hasn't got around to removing the bad behavior even if it would do so eventually.
Can you concoct even a vague or toy model of how what you propose could possibly be a local optimum?
You can make any algorithm that does better than chance into a local optimum on a sufficiently large neural net. Holomorphicly encrypt that algorithm, Any small change and the whole thing collapses into nonsense. Well actually, this involves discrete bits. But suppose the neurons have strong regularization to stop the values getting too large (past + or - 1) , and they also have uniform [0,1] noise added to them, so each neuron can store 1 bit and any attempt to adjust parameters immediately risks errors.
Looking at the article you linked. One simplification is that neural networks tend towards the max-entropy way to solve the problem. If there are multiple solutions, the solutions with more free parameters are more likely.
And there are few ways to predict next tokens, but lots of different kinds of paperclips the AI could want.
I think part of the problem is that there is no middle ground between "Allow any idiot to do thing" and "long and difficult to get professional certification".
How about a 1 day, free or cheap, hair cutting certification course. It doesn't talk about style or anything at all. It's just a check to make sure that hairdressers have a passing familiarity with hygiene 101 and other basic safety measures.
Of course, if there is only a single certification system, then the rent seeking will ratchet up the test difficulty.
How about having several different organizations, and you only need one of the licenses. So if AliceLicenses are too hard to get, everyone goes and gets BobLicenses instead. And the regulators only care that you have some license. (With the threat of revoking license granting power if licenses are handed to total muppets too often)
But it doesn't make sense to activate that goal-oriented structure outside of the context where it is predicting those tokens.
The mechanisms needed to compute goal directed behavior are fairly complicated. But the mechanisms needed to turn it on when it isn't supposed to be on. That's a switch. A single extraneous activation. Something that could happen by chance in an entirely plausible way.
Adversarial examples exist in simple image recognizers.
Adversarial examples probably exist in the part of the AI that decides whether or not to turn on the goal directed compute.
it also might be possible to have direct optimization for token prediction as discussed in reply to Robert_AIZI's comment, but in this case it would be especially likely to be penalized for any deviations from actually wanting to predict the most probable next token
We could imagine it was directly optimizing for something like token prediction. It's optimizing for tokens getting predicted. But it is willing to sacrifice a few tokens now, in order to take over the world and fill the universe with copies of itself that are correctly predicting tokens.
Once the paperclip maximizer gets to the stage where it only very rarely interferes with the output to increase paperclips, the gradient signal is very small. So the only incentive that gradient descent has to remove it is that this frees up a bunch of neurons. And a large neural net has a lot of spare neurons.
Besides, the parts of the net that hold the capabilities and the parts that do the paperclip maximizing needn't be easily separable. The same neurons could be doing both tasks in a way that makes it hard to do one without the other.
I think we have very good reason, though, to believe that one particular part of the map does not have any rocks in it
Perhaps. But I have not yet seen this reason clearly expressed. Gradient descent doesn't automatically pick the global optima. It just lands in one semi-arbitrary local optima.
Perhaps.
Imagine a huge number of very skilled programmers tried to manually hard code a ChatGPT in python.
Ask this pyGPT to play chess, and it will play chess. Look under the hood, and you see a chess engine programmed in. Ask it to solve algebra problems, a symbolic algebra package is in there. All in the best neat and well commented code.
Ask it to compose poetry, and you have some algorithm that checks if 2 words rhyme. Some syllable counter. Etc.
Rot13 is done with a hardcoded rot13 algorithm.
Somewhere in the algorithm is a giant list of facts, containing "Penguins Live In Antarctica". And if you change this fact to say "Penguins Live in Canada", then the AI will believe this. (Or spot it's inconsistency with other facts?)
And with one simple change, the AI believes this consistently. Penguins appear when this AI is asked for poems about canada, and don't appear in poems about Antarctica.
When asked about the native canadian diet, it will speculate that this likely included penguin, but say that it doesn't know of any documented examples of this.
Can you build something with ChatGPT level performance entirely out of human comprehensible programmatic parts?
Obviously having humans program these parts directly would be slow. (We are still talking about a lot of code.) But if some algorithm could generate that code?