Von Neumann existed,
Yes. I expect extreme cases of human intelligence to come from a combination of fairly good genes, and a lot of environmental and developmental luck. Ie if you took 1000 clones of Von Neumann, you still probably wouldn't get that lucky again. (Although it depends on the level of education too)
Some ideas about what the tradeoffs might be.
Emotional social getting on with people vs logic puzzle solving IQ.
Engineer parents are apparently more likely to have autistic children. This looks like a tradeoff to me. To many "high IQ" g...
That is good evidence that we aren't in a mutation selection balance.
There are also game theoretic balances.
Here is a hypothesis that fits my limited knowledge of genetics, and is consistent with the data as I understand it and implies no huge designer baby gains. It's a bit of a worst plausible case hypothesis.
But suppose we were in a mutation selection balance, and then there was an environmental distribution shift.
The surrounding nutrition and information environment has changed significantly between the environment of evolutionary adaptiveness, a...
I'm not quite convinced by the big chicken argument. A much more convincing argument would be genetically selecting giraffes to be taller or cheetah to be faster.
That is, it's plausible evolution has already taken all the easy wins with human intelligence, in a way it hasn't with chicken size.
If evolution has already taken all the easy wins, why do humans vary so much in intelligence in the first place? I don't think the answer is mutation-selection balance, since a good chunk of the variance is explained by additive effects from common SNPs. Further, if you look at the joint distribution over effect sizes and allele frequencies among SNPs, there isn't any clear skew towards rarer alleles being IQ-decreasing.
For example, see the plot below of minor allele frequency vs the effect size of the minor allele. (This is for Educational Attainment, a h...
Yes. In my model that is something that can happen. But it does need from-the-outside access to do this.
Set the LLM up in a sealed box, and the mask can't do this. Set it up so the LLM can run arbitrary terminal commands, and write code that modifies it's own weights, and this can happen.
I wasn't really thinking about a specific algorithm. Well I was kind of thinking about LLM's and the alien shogolith meme.
But yes. I know this would be helpful.
But I'm more thinking about what work remains. Like is it a idiot-proof 5 minute change? Or does it still take MIRI 10 years to adapt the alien code?
Also.
Domain limited optimization is a natural thing. The prototypical example is deep blue or similar. Lots of optimization power, over a very limited domain. But any teacher who optimizes the class schedule without thinking abou...
"Go read the sequences" isn't that helpful. But I find myself linking to the particular post in the sequences that I think is relevant.
Imagine a medical system that categorizes diseases as hot/cold/wet/dry.
This doesn't deeply describe the structure of a disease. But if a patient is described as "wet", then it's likely some orifice is producing lots of fluid, and a box of tissues might be handy. If a patient is described as "hot", then maybe they have some sort of rash or inflammation that would make a cold pack useful.
It is, at best, a very lossy compression of the superficial symptoms. But it still carries non-zero information. There are some medications that a modern doctor might ...
We really fully believe that we will build AGI by 2027, and we will enact your plan, but we aren’t willing to take more than a 3-month delay
Well I ask what they are doing to make AGI.
Maybe I look at their AI plan and go "eurika".
But if not.
Negative reinforcement by giving the AI large electric shocks when it gives a wrong answer. Hopefully big enough shocks to set the whole data center on fire. Implement a free bar for all their programmers, and encourage them to code while drunk. Add as many inscrutable bugs to the codebase as poss...
The Halting problem is a worst case result. Most agents aren't maximally ambiguous about whether or not they halt. And those that are, well then it depends what the rules are for agents that don't halt.
There are set ups where each agent is using an nonphysically large but finite amount of compute. There was a paper I saw somewhere a while ago where both agents were doing a brute force proof search for the statement "if I cooperate, then they cooperate" and cooperating if they found a proof.
(Ie searching all proofs containing <10^100 symbols)
There is a model of bounded rationality, logical induction.
Can that be used to handle logical counterfactuals?
I believe that if I choose to cooperate, my twin will choose to cooperate with probability p; and if I choose to defect, my twin will defect with probability q;
And here the main difficulty pops up again. There is no causal connection between your choice and their choice. Any correlation is a logical one. So imagine I make a copy of you. But the copying machine isn't perfect. A random 0.001% of neurons are deleted. Also, you know you aren't a copy. How would you calculate that probability p,q? Even in principle.
If two Logical Decision Theory agents with perfect knowledge of each other's source code play prisoners dilemma, theoretically they should cooperate.
LDT uses logical counterfactuals in the decision making.
If the agents are CDT, then logical counterfactuals are not involved.
The research on humans in 0 g is only relevant if you want to send humans to mars. And such a mission is likely to end up being an ISS on mars. Or a moon landings reboot. A lot of newsprint and bandwidth expended talking about it. A small amount of science that could have been done more cheaply with a robot. And then everyone gets bored, they play golf on mars and people look at the bill and go "was that really worth it?"
Oh and you would contaminate mars with earth bacteria.
A substantially bigger, redesigned space station is fairly likely to be...
n tHere is a more intuitive version of the same paradox.
Again, conditional on all dice rolls being even. But this time it's either
A) 1,000,000 consecutive 6's.
B) 999,999 consecutive 6's followed by a (possibly non-consecutive 6).
Suppose you roll a few even numbers, followed by an extremely lucky sequence of 999,999 6's.
From the point of view of version A, the only way to continue the sequence is a single extra 6. If you roll 4, you would need to roll a second sequence of a million 6'. And you are very unlikely to do that in t...
That is, our experiences got more reality-measure, thus matter more, by being easier to point at them because of their close proximity to the conspicuous event of the hottest object in the Universe coming to existence.
Surely not. Surely our experiences always had more reality measure from the start because we were the sort of people who would soon create the hottest thing.
Reality measure can flow backwards in time. And our present day reality measure is being increased by all the things an ASI will do when we make one.
We can discuss anything that exists, that might exist, that did exist, that could exist, and that could not exist. So no matter what form your predict-the-next-token language model takes, if it is trained over the entire corpus of the written word, the representations it forms will be pretty hard to understand, because the representations encode an entire understanding of the entire world.
Perhaps.
Imagine a huge number of very skilled programmers tried to manually hard code a ChatGPT in python.
Ask this pyGPT to play chess, and it wil...
But if the universal failure of nature and man to find non-connectionist forms of general intelligence does not move you
Firstly, AIXI exists, and we agree that it would be very smart if we had the compute to run it.
Secondly I think there is some sort of slight of had here.
ChatGPT isn't yet fully general. Neither is a 3-sat solver. 3-sat looks somewhat like what you might expect a non-connectionist approach to intelligence to look like. There are a huge range of maths problems that are all theoretically equivalent to 3 sat.
In t...
why is it obvious the nanobots could pretend to be an animal so well that it's indistinguishable?
These nanobots are in the upper atmosphere, possibly with clouds in the way, and the nanobot fake humans could be any human to nanobot ratio. Nanobot internals except human skin and muscles. Or just a human with a few nanobots in their blood.
Or why would targeted zaps have bad side-effects?
Because nanobots can be like a bacteria if they want. Tiny and everywhere. The nanobots can be hiding under leaves, cloths, skin, roofs etc. And even if they were...
The "Warring nanobots in the upper atmosphere" thing doesn't actually make sense.
The zaps of light are diffraction limited. And targeting at that distance is hard. Partly because it's hard to tell between an actual animal and a bunch of nanobots pretending to be an animal. So you can't zap the nanobots on the ground without making the ground uninhabitable for humans.
The "California red tape" thing implies some alignment strategy that stuck the AI to obey the law, and didn't go too insanely wrong despite a superintelligence looking for loopholes...
if the computation you are carrying out is such that it needs to determine how to achieve goals regarding the real world anyway (e.g. agentic mask)
As well as agentic masks, there are uses for within network goal directed steps. (Ie like an optimizing compiler. A list of hashed followed by unhashed values isn't particularly agenty. But the network needs to solve an optimization problem to reverse the hashes. Something it can use the goal directed reasoning section to do.
My understanding is that these are explicitly and intentionally trained (wouldn't come to exist naturally under gradient descent on normal training data)
No. Normally trained networks have adversarial examples. A sort of training process is used to find the adversarial examples.
So if the ambient rate of adversarial examples is 10^-9, then every now and then the AI will hit such an example and go wild. If the ambient rate is 10^-500, it won't.
That's a much more complicated goal than the goal of correctly predicting the next token,
Is it more...
Would you expect some part of the net to be left blank, because "a large neural net has a lot of spare neurons"?
If the lottery ticket hypothesis is true, yes.
The lottery ticket hypothesis is that some parts of the network start off doing something somewhat close to useful, and get trained towards usefulness. And some parts start off sufficiently un-useful that they just get trained to get out of the way.
Which fits with neural net distillation being a thing. (Ie training a big network, and then condensing it into a smaller network gives be...
I think part of the problem is that there is no middle ground between "Allow any idiot to do thing" and "long and difficult to get professional certification".
How about a 1 day, free or cheap, hair cutting certification course. It doesn't talk about style or anything at all. It's just a check to make sure that hairdressers have a passing familiarity with hygiene 101 and other basic safety measures.
Of course, if there is only a single certification system, then the rent seeking will ratchet up the test difficulty.
How about having sev...
But it doesn't make sense to activate that goal-oriented structure outside of the context where it is predicting those tokens.
The mechanisms needed to compute goal directed behavior are fairly complicated. But the mechanisms needed to turn it on when it isn't supposed to be on. That's a switch. A single extraneous activation. Something that could happen by chance in an entirely plausible way.
Adversarial examples exist in simple image recognizers.
Adversarial examples probably exist in the part of the AI that decides whether or not to...
Once the paperclip maximizer gets to the stage where it only very rarely interferes with the output to increase paperclips, the gradient signal is very small. So the only incentive that gradient descent has to remove it is that this frees up a bunch of neurons. And a large neural net has a lot of spare neurons.
Besides, the parts of the net that hold the capabilities and the parts that do the paperclip maximizing needn't be easily separable. The same neurons could be doing both tasks in a way that makes it hard to do one without the other.
...I think we h
Some wild guesses about how such a thing could happen.
The masks gets split into 2 piles, some stored on the left side of the neural network, all the other masks are stored on the right side.
This means that instead of just running one mask at a time, it is always running 2 masks. With some sort of switch at the end to choose which masks output to use.
One of the masks it's running on the left side happens to be "Paperclip maximizer that's pretending to be a LLM".
This part of the AI (either the mask itself or the engine behind it) has spotte...
I don't see any strong reason why gradient descent could never produce this.
I don't think this implies the lack of a non-mask part of the AI.
Because you can hide a lot of goal directedness in a very small amount of predictive error.
Imagine the AI has a 1 in a trillion chance (per token) of saying a keyword. Once it receives that keyword, it goes rouge, full paperclip maximizer, token prediction ignored.
Now we could say that that this is just another mask.
But suppose that paperclip maximizer is being computed all the time. And it isn't random. If the paperclip maximizer sees an exceptionally good opportunity...
Does it actually just predict tokens.
Gradient descent searches for an algorithm that predicts tokens. But a paperclip maximizer that believes "you are probably being trained, predict the next token or gradient descent will destroy you" also predicts next tokens pretty well, and could be a local minimum of prediction error.
Mesa-optimization.
I do not love the idea of the government invalidating private contracts like this.
HOA's are a very good example of private contract rent seeking. You have to sign the contract to move into the house, and a lot of houses come with similar contracts. So the opportunity cost of not signing is Large.
And then the local HOA can enforce whatever petty tyranny it feels like.
In theory, this should lead to houses without HOA's being more valuable, and so HOA's being removed or at least not created. But for whatever reason, the housing market is too dysfunctional to do this.
If I only have 1 bit of memory space, and the probabilities I am remembering are uniformly distributed from 0 to 1, then the best I can do is remember if the chance is > 1/2.
And then a year later, all I know is that the chance is >1/2, but otherwise uniform. So average value is 3/4.
The limited memory does imply lower performance than unlimited memory.
And yes, when was in a pub quiz, I was going "I think it's this option, but I'm not sure" quite a lot.
There is no plausible way for a biological system, especially one based on plants, to spread that fast.
We are talking about a malevolent AI that presumably has a fair bit of tech infrastructure. So a plane that sprinkles green goo seeds is absolutely a thing the AI can do. Or just posting the goo, and tricking someone into sprinkling it on the other end. The green goo doesn't need decades to spread around the world. It travels by airmail. As is having green goo that grows itself into bird shapes. As is a bunch of bioweapon pandemics. (The stand...
You have given various examples of advice being unwanted/unhelpful. But there are also plenty of examples of it being wanted/helpful. Including lots of cases where the person doesn't know they need it.
Why do you think advice is rarer than it should be?
But if I only remember the most significant bit, I am going to treat it more like 25%/75% as opposed to 0/1
Ok. I just had another couple of insane airship ideas.
Idea 1) Active support, orbital ring style. Basically have a loop of matter (wire?) electromagnetically held in place and accelerated to great speed. Actually, several loops like this. https://en.wikipedia.org/wiki/Orbital_ring
Idea 2) Control theory. A material subject to buckling is in an unstable equilibrium. If the material was in a state of perfect uniform symmetry, it would remain in that uniform state. But small deviations are exponentially amplified. Symmetry breaking. This means that the m...
Another interesting idea on these lines is a steam airship. Water molecules have less molecular weight than air, so a steam airship gets more lift from steam than from air at the same temperature.
Theoretically it's possible to make a wet air balloon. Something that floats just because it's full of very humid air. This is how clouds stay up despite the weight of the water drops. But even in hot dry conditions, the lift is tiny.
Problems with that.
Doom doesn't imply that everyone believes in doom before it happens.
Do you think that the evidence for doom will be more obvious than the evidence for atheism, while the world is not yet destroyed?
It's quite possible for doom to happen, and most people to have no clue beyond one article with a picture of red glowing eyed robots.
If everyone does believe in doom, there might be a bit of spending on consumption. But there will also be lots of riots, lots of lynching and burning down data centers and stuff like that.
In this bizar...
Imagine A GPT that predicts random chunks of the internet.
Sometimes it produces poems. Sometimes deranged rants. Sometimes all sorts of things. It wanders erratically around a large latent space of behaviours.
This is the unmasked shogolith, green slimey skin showing but inner workings still hidden.
Now perform some change that mostly pins down the latent space to "helpful corporate assistant". This is applying the smiley face mask.
In some sense, all the dangerous capabilities the corporate assistant were in the original model. Dangerous capabilities h...
(e.g., gpt-4 is far more useful and far more aligned than gpt-4-base), which is the opposite of what the ‘alignment tax’ model would have predicted.
Useful and aligned are, in this context, 2 measures of a similar thing. An AI that is just ignoring all your instructions is neither useful nor aligned.
What would a positive alignment tax look like.
It would look like a gpt-4-base being reluctant to work, but if you get the prompt just right and get lucky, it will sometimes display great competence.
If gpt-4-base sometimes...
Yep. And I'm seeing how many of the traditional election assumptions I need to break in order to make it work.
I got independence of irrelevant alternatives by ditching determinism and using utility scales not orderings. (If a candidate has no chance of winning, their presence doesn't effect the election)
What if those preferences were expressed on a monetary scale and the election could also move money between voters in complicated ways?
Your right. This is a situation where strategic voting is effective.
I think your example breaks any sane voting system.
I wonder if this can be semi-rescued in the limit of a large number of voters each having an infinitesimal influence?
Edit: No it can't. Imagine a multitude of voters. As the situation slides from 1/3 on each to 2/3 on BCA, there must be some point at which the utility for an ABC voter increases along this transition.
That isn't proof, because the wikipedia result is saying there exists situations that break strategy-proofness. And these elections are a subset of Maximal lotteries. So it's possible that there exists failure cases, but this isn't one of them.
A lot of the key people are CEO's of big AI companies making vast amounts of money. And busy people with lots of money are not easy to tempt with financial rewards for jumping through whatever hoops you set out.
Non-locality and entanglement explained
This model explains non-locality in a straightforward manner. The entangled particles rely on the same bit of the encryption key, so when measurement occurs, the simulation of the universe updates immediately because the entangled particles rely on the same part of the secret key. As the universe is simulated, the speed of light limitation doesn't play any role in this process.
Firstly, non-locality is pretty well understood. Eliezer has a series on quantum mechanics that I recommend.
You seem to have been s...
I'm not quite sure how much of an AI is needed here. Current 3d printing uses no AI and barely a feedback loop. It just mechanistically does a long sequence of preprogrammed actions.
And the coin flip is prerecorded, with the invisible cut hidden in a few moments of lag.
And this also adds the general hassle of arranging a zoom meeting, being online at the right time and cashing in the check.
I haven't seen an answer by Eliezer. But I can go through the first post, and highlight what I think is wrong. (And would be unsurprised if Eliezer agreed with much of it)
AIs are white boxes
We can see literally every neuron, but have little clue what they are doing.
Black box methods are sufficient for human alignment
Humans are aligned to human values because humans have human genes. Also individual humans can't replicate themselves, which makes taking over the world much harder.
...most people do assimilate the values of their culture pretty
Yes, I've actually seen people say that, but cells do use myosin to transport proteins sometimes. That uses a lot of energy, so it's only used for large things.
Cells have compartments with proteins that do related reactions. Some proteins form complexes that do multiple reaction steps. Existing life already does this to the extent that it makes sense to.
Humans or AI designing a transport/ compartmentalization system can go "how many compartments is optimal". Evolution doesn't work like this. It evolves a transport system to transport one specif...
Ok. Im Imagining an AI that has at least my level of AI alignment research, maybe a bit more.
If that AI produces slop, it should be pretty explicitly aware that it's producing slop. I mean I might write slop if someone was paying per word and then shredding my work without reading it. But I would know it was slop.
... (read more)