Submission: MMDo2Little
A follow-up of last years MMDoolittle, which incorporated 17 of the latest inter-species communication modalities in one polyfunctional personality, I present MMDo2Little, the first mind crafted to communicate across clades. Named in part for its apparent inactivity-- the judges will likely have little success finding recognizable activity with their off-the-shelf tooling-- an instance of MMDo2Little is nevertheless currently installed in the heart of the Black Forest in Germany. The best interpretation of the instance can only be fo...
Great post! Would love to see something like this for all the methods in play at the moment.
BTW, I think nnsight is the spiritual successor of baukit, from the same group. I think they are merging them at some point. Here is an implementation with it for reference :).
...
from nnsight import LanguageModel
# Load the language model
model = LanguageModel("gpt2")# Define the steering vectors
with model.invoke("Love") as _:
act_love = model.transformer.h[6].output[0][:, :, :].save()with model.invoke("Hate") as _:
act_hate = model.transfor
Inferring properties of the authors of some text isn’t itself something I consider wildly useful for takeover, but I think of it as belonging to this more general cluster of capabilities.
You don't? Ref the bribery and manipulation in eg. Clippy. Knowing who you are dealing with seems like a very useful capability in a lot of different scenarios. Eg. you mention phishing.
Great post! I'm all for more base model research.
Would you say that tokenization is part of the architecture?
And, in your wildest moments, would you say that language is also part of the architecture :)? I mean the latent space is probably mapping either a) brain states or b) world states right? Is everything between latent spaces architecture?
Interesting post. Two comments:
Beagles such as Fido.
Which seems natural enough to me, though I don't disagree that what you point out is interesting. I was recently reading parts of Analytical Archaeology, David Clark (1978) where he goes into some detail about the difference between artifacts and artifact-types. Seems like you are getting at statements like
The object is a phone.
Where the is-a maps from an artifact to its type. It would make intuitive sense to me that languages would have a preferred orientation w.r.t such a mapping-- this is the core of a...
If we take our discrete, symbolic representation and stretch it out into a larger continuous representation which can interpolate between its points then we get a latent geometry in which the sign and what it points to can be spatially related.
IIUTC this is essentially what the people behind the universal networking language were hoping to do? I hope some of them are keeping up with all of this!
One criticism of humanism you don't seem to touch on is,
And indeed, it was something very like humanism (let's call it specific humanism) that laid the ideological foundation for the slave trade and the holocaust.
My view is that humanism can be thought of as a hangover of Christian values, the belief that our minds are the endowments of God.
But if we have been touched by the angels, perhaps the non metaphorical component of that is the development of the info...
Per the recent Nightshade paper, clown attacks would be a form of semantic poisoning on specific memeplexes, where 'memeplex' basically describes the architecture of some neural circuits. Those memeplexes at inference time would produce something designed to propagate themselves (a defence or description of some idea, submeme), and a clown attack would make that propagation less effective at transmitting to eg. specific audiences.
I wanted to make a comment on this post, but now I'm not sure if it is supported. The comment follows:
--
Great post! One point:
And that is exactly what'd we necessarily expect to see in the historical record if mesaoptimization inner misalignment was a common failure mode: intelligent dinosaurs that suddenly went extinct, ruins of proto pachyderm cities, the traces of long forgotten underwater cetacean atlantis, etc.
There are a few circumstances under which we would expect to see some amount of archeological civs, such as:
It would be like a single medieval era human suddenly taking over the world via powerful magic. Would the resulting world after optimization according to that single human's desires still score reasonably well at IGF?
Interestingly, this thought experiment was run many times at the time, see for example all the wish fulfillment fantasies in the 1001 Nights or things like the Sorcerers Apprentice.
Excellent post.
First, in the case of the Puritans, does two-boxing (living a life of laziness) actually provide more utility?
I think it's clear that, from a removed perspective, hard work often leads to the satisfaction of a life well lived. But this is the whole point of philosophical ideas like this (or even simpler memes like 'work is good for the soul')-- it helps us overcome suboptimal equilibria, like laziness.
I hope the claim was normalized and inflation adjusted, otherwise it's the same as 'the latest protest-riot in the world's richest country'!
There seems to be a whole lot of talking-past happening with LWers and Hanson. He has a lot of value to contribute to the debate, but maybe the way he communicates that is offputting to people here.
For example, this recent post reiterates a lot of points that Hanson has been making for decades, but doesn't mention or cite his work anywhere. I find it quite bizarre.
I think this post is being as uncharitable to Hanson as he is being to 'the doomers'. This kind of reciprocal deliberate misunderstanding is silly, and LW should be above it and enjoy and respect...
I think this is excellent particularly because IQ tests often max out quickly on skills that can't be examined quickly. It would he great to put people in tests that examine their longer timeframe abilities via eg. writing a longish story (perhaps containing a theory of Alzheimer's). But tests don't last that long.
Games however do last long and do manage to keep people's attention for a long time. So you might really be able to test how differentially skilled someone is over longer timeframes.
If you construct a hypothetical wherein there is obviously no space for evolutionary dynamics, then yes, evolutionary dynamics are unlikely to play a big role.
The case I was thinking of (which would likely be part of the research process towards 'brains in vats'-- essentially a prerequisit) is larger and larger collectives of designed organisms, forming tissues etc.
It may be possible to design a functioning brain in a vat from the ground up with no evolution, but I imagine that
a) you would get there faster verifying hypotheses with in vitro experimen...
(2) can an AI use nanotech as a central ingredient of a plan to operate perpetually in a world without humans?
In the 'magical nano exists' universe, the AI can do this with well-behaved nanofactories.
In the 'bio-like nano' universe, 'evolutionary dynamics' (aka game theory among replicators under high brownian noise) will make 'operate perpetually' a shaky proposal for any entity that values its goals and identity. No-one 'operates perpetually' under high noise, goals and identity are constantly evolving.
So the answer to the question is likely 'no'-- you n...
Also worth noting w.r.t this that an AI that is leaning on bio-like nano is not one that can reliably maintain control over its own goals-- it will have to gamble a lot more with evolutionary dynamics than many scenarios seem to imply meaning:
- instrumental goal convergence more likely
- paperclippers more unlikely
So again, tabooing magical nano has a big impact on a lot of scenarios widely discussed.
parents should not have the right to deny their offspring a chance to exist
but again here you are switching back from the population level to the individual level. Those offspring do not exist by default, there are no 'offspring' that the parents have 'denied the right to exist'. There are only counterfactual offspring, who already don't exist.
spy on their kids' futures by reading their genome
this, on the other hand, may be more valid-- because the parents will 'spy on' both actual and counterfactual childrens genomes (and select the former over the ...
Ah I see.
I certainly concede that the argument about counterfactual populations has a lot more force.
Personally I would solve this with increased support for eg. polygenic screening and other reproductive technologies and less regulation about what they can select for, and hope that people do their weird people thing and choose diversity. I worry that regulation will always result in more standardization.
And I for sure don't think punishing people for making reproductive choices is a good move, even if those choices result in the extinction of specific populations.
How is this kind of reasoning about counterfactual children never born different from the regular Christian stuff about not masturbating?
A statements like 'my parents would have used polygenic screening to kill me' is no more meaningful than 'you are murdering your counterfactual children when you wear a condom' or something like that. It seems to have more meaning because you are talking about yourself, but in the universe where 'you' were 'murdered' by polygenic screening, 'you' does not refer to anything.
Thats fair however, I would say that the manner of foom determines a lot about what to look out for and where to put safeguards.
If it's total($) thats obvious how to look out.
flop/$ also seems like something that eg. NVIDIA is tracking closely, and per OP probably can't foom too rapidly absent nanotech.
So the argument is something about the (D*I)/flop dynamics.
[redacted] I wrote more here but probably its best left unsaid for now. I think we are on a similar enough page.
It seems here that you are really worried about 'foom in danger' (danger per intelligence, D / I) than regular foom (4+ OOM increase in I), if I am reading you correctly. Like I don't see a technical argument that eg. the claims in OP about any of
/flop, flop/J, total(J), flop/$, or total($)
are wrong, you are just saying that 'D / I will foom at some point' (aka a model becomes much more dangerous quickly, without needing to be vastly more powerful algorithmically or having much more compute).
This doesn't change things much but I just want to underst...
TC is Tyler Cowen.
I don't think the base rates are crazy-- the new evolution of hominins one is only wrong if you forget who 'you' is. TC and many other people are assuming that 'we' will be the 'you' that are evolving. (The worry among people here is that 'they' will have their own 'you'.)
And the second example, writing new software that breaks-- that is the same as making any new technology, we have done this before, and we were fine last time. Yes there were computer viruses, yes some people lost fingers in looms back in the day. But it was okay in the ...
Instead, we're left relying on more abstract forms of reasoning
See, the frustrating thing is, I really don't think we are! There are loads of clear, concrete things that can be picked out and expanded upon. (See my sibling comment also.)
Thanks very much for this thorough response!
One thing though-- in contrast to the other reply, I'm not so convinced by the problem that
No such general science of intelligence exists at the moment.
This would be like the folks at Los Alomos saying 'well, we need to model the socioeconomic impacts of the bomb, plus we don't even know what happens to a human subjected to such high pressures and temperatures, we need a medical model and a biological model' etc. etc.
They didn't have a complete science of socioeconomics. Similarly, we don't have a complete ...
...Say you’re told that an agent values predicting text correctly. Shouldn’t you expect that:
- It wants text to be easier to predict, and given the opportunity will influence the prediction task to make it easier (e.g. by generating more predictable text or otherwise influencing the environment so that it receives easier prompts);
- It wants to become better at predicting text, and given the opportunity will self-improve;
- It doesn’t want to be prevented from predicting text, and will prevent itself from being shut down if it can?
In short, all the same types of inst
This is the closest thing yet! Thank you. Maybe that is it.
Yeah, unfortunately 'somewhat argue for foom' is exactly what I'm not looking for, rather a simple and concrete model that can aid communication with people who don't have time to read the 700-page Hanson-Yudkowsky debate. (Which I did read, for the record.)
With what little I know now I think 2 would be most clear to people. However I appreciate that that might contribute to capabilities, so maybe exfohazard.
4 is definitely interesting, and I think there are actually a few significant papers about instrumental convergence. More of those would be good, but I don't think that gets to the heart of the matter w.r.t a simple model to aid communication.
5. I would love some more information theory stuff, drilling into how much information is communicated to eg. a model relative to how much is contained in the world....
give THEM plausibility deniability about having to understand or know things based on their own direct assessment
I don't follow what you are getting at here.
I'm just thinking about historical cases of catastrophic risk, and what was done. One thing that was done, was the the government payed very clever people to put together models of what might happen.
My feeling is that the discussion around AI risk is stuck in an inadequate equilibrium, where everyone on the inside thinks its obvious but people on the outside don't grok it. I'm trying to think of the mi...
In summary: this proposals feels like you're personally asking to be "convinced in public using means that third parties can watch, so that third parties will grant that it isn't your personal fault for believing something at variance with the herd's beliefs" and not like your honest private assessment of the real situation is bleak. These are different things.
Well, thats very unfortunate because that was very much not what I was hoping for.
I'm hoping to convince someone somewhere that proposing a concrete model of foom will be useful to help think about p...
Thats a good paper, but I think it exemplifies the problem outlined by Cowen-- it mostly contains references to Bostrom and Yudkowsky, it doesn't really touch on more technical stuff (Yampolskiy, Schmidhuber) which exists, which makes me think that it isn't a very thorough review of the field. It seems like more of the same. Maybe the Hubinger paper referenced therein is on the right track?
The question of where to do science is relevent but not important-- Cowen even mentions that 'if it doesn't get published, just post it online'-- he is not against readi...
needs to be done interactively ... people get stuck in a variety of different ways
I think the previous examples of large-scale risk I mentioned are a clear counterexample-- if you have at least one part of the scenario clearly modeled, people have something concrete to latch on to.
You also link somewhere that talks about the nuclear discontinuity, and hints at an intelligence discontinuity-- but I also went searching for evidence of a discontinuity in cognition and didn't find one. You would expect cognitive scientists to have found this by now.
Hard to fin...
Fake Journal Club, now coming to a forum near you! Today's winner, the gear to ascension, will receive one (1) gold-plated gear for their gear collection!
I expect Magnus Carlsen to be closer in ELO to a bounded superintelligence than to a median human.
Seems like this sort of claim could be something tractable that would qualify as material progress on understanding bounds to superintelligence? I'm thinking about results such as this.
However I think that post's title oversells the result-- from the paper:
...This paper has demonstrated that even superhuman agents can be vulnerable to adversarial policies. However, our results do not establish how common such vulnerabilities are: it is possible Go-playing AI syst
This is completely absurd, because actual superintelligences are just going to draw each other 100% of the time. Ergo, there can never be a one-million Elo chess engine.
Do you have some idea of where the ceiling might be, that you can say that with confidence?
Just looking at this, seems like research in chess has slowed down. Makes sense. But did we actually check if we were near a chess capabilities ceiling before we slowed down? I'm wondering if seeing how far we can get above human performance could give us some data about limits to superintelligence..
Everyone here acting like this makes him some kind of soothsayer is utterly ridiculous. I don't know when it became cool and fashionable to toss off your epistemic humility in the face of eternity, I guess it was before my time.
The basilisk is just pascals mugging for edgelords.
Maybe you got into trouble for talking about that because you are rude and presumptive?
definitely
as a human talking about ASI, the word 'definitely' is cope. You have no idea whatsoever, but you want to think you do. Okay.
extract all the info it could
we don't know how information works at small scales, and we don't know whether an AI would either. We don't have any idea how long it would take to "extract all the info it could", so this phrase leaves a huge hole.
them maybe simulate us
which presumes that it is as arrogant in you in 'knowing' what it can 'def...
Maybe you got into trouble for talking about that because you are rude and presumptive?
I think this is just a nod to how he's literally Roko, for whom googling "Roko simulation" gives a Wikipedia article on what happened last time.
That isn't my argument, my argument is just that the general tone seems too defeatist.
The question asker was under the impression that the probabilities were %99.X percent against anything okay. My only argument was that this is wrong, and there are good reasons that this is wrong.
Where the p(doom) lies between 99 and 1 percent is left as an exercise for posterity. I'm not totally unhinged in my optimism, I just think the tone of certain doom is poorly founded and there are good reasons to have some measure of hope.
Not just 'i dunno, maybe it will be fine' but real reasons why it could conceivably be fine. Again, the probabilities are up for debate, I only wanted to present some concrete reasons.
The information could be instrumentally useful for any of the following Basic AI Drives:
Just to preserve information. It's not every day that you come across a thermodynamic system that has been evolving so far from equilibrium for so long. There is information here.
In general, I feel like a lot of people in discussion about ASI seem to enjoy fantasizing about science fiction apocalypses of various kinds. Personally I'm not so interested in exercises in fancy, rather looking at ways physical laws might imply that 'strong orthogonality' is unlikely to obtain in reality.
Haha, totally agree- I'm very much at the limit of what I can contribute.
In an 'Understanding Entropy' seminar series I took part in a long time ago we discussed measures of complexity and such things. Nothing was clear then or is now, but the thermodynamic arrow of time plus the second law of thermodynamics plus something something complexity plus the fermi observation seems to leave a lot of potential room for this planet is special even from a totally misanthropic frame.
Enjoy the article!
"Whatever happened here is a datapoint about matter and energy doing their usual thing over a long period of time."
Not all thermodynamic systems are created equal. I know enough about information theory to know that making bold claims about what is interesting and meaningful is unwise. But I also know it is not certain that there is no objective difference between a photon wandering through a vacuum and a butterfly.
Here is one framework for understanding complexity that applies equally well for stars, planets, plants, animals, humans and AIs. It is possibl...
Whatever happened here is an interesting datapoint about the long-term evolution of thermodynamic systems away from equilibrium.
From the biological anchors paper:
This implies that the total amount of computation done over the course of evolution from the first animals with neurons to humans was (~1e16 seconds) * (~1e25 FLOP/s) = ~1e41 FLOP.
Note that this is just computation of neurons! So the total amount of computation done on this planet is much larger.
This is just illustrative, but the point is that what happened here is not so trivial or boring th...
If the ASI was 100% certain that there was no interesting information embedded in the Earths ecosystems that it couldn't trivially simulate, then I would agree.
Do you pick up every penny that you pass in the street?
The amount of energy and resources on Earth would be a rounding error in an ASI's calculations. And it would be a rounding error that happens to be incredibly complex and possibly unique!
Maybe a more appropriate question is, do you pick every flower that you pass in the park? What if it was the only one?
If there was a system which was really good at harvesting energy and it was maxxed out on intelligence, atoms might be very valuable, especially atoms close to where it is created
The number of atoms on earth is so tiny. Why not just head to the asteroid belt where you can really build?
I'm not sure what you think I believe, but yeah I think we should be looking at scenarios in between the extremes.
I was giving reasons why I maintain some optimism, and maintaining optimism while reading Yudkowsky leaves me in the middle, where actions can be taken.
Very cool! Could you share your code at all? I'd love to explore this a little.
I adore the broccoli tree. I would be very happy to convert the dataset you used to make those pngs into an interactive network visualization and share it with you as an index.html. It would take all of an hour.
I do kind of agree with the other comments that, having noticed something, finding more of that stuff in that area is not so surprising. I think it would be good to get more context and explore the region more before concluding that that particular set of generations is s... (read more)