This is really good, and it'll be required reading for my new 'Psychology and AI' class that I'll teach next year.
Students are likely to ask 'If the blob can figure out so much about the world, and modify its strategies so radically, why does it still want sugar? Why not just decide to desire something more useful, like money, power, and influence?'
Shutting down OpenAI entirely would be a good 'high level change', at this point.
Well I'm seeing no signs at all, whatsoever, that OpenAI would ever seriously consider slowing, pausing, or stopping its quest for AGI, no matter what safety concerns get raised. Sam Altman seems determined to develop AGI at all costs, despite all risks, ASAP. I see OpenAI as betraying virtually all of its founding principles, especially since the strategic alliance with Microsoft, and with the prospect of colossal wealth for its leaders and employees.
At this point, I'd rather spend $5-7 trillion on a Butlerian Jihad to stop OpenAI's reckless hubris.
Human intelligence augmentation is feasible over a scale of decades to generations, given iterated polygenic embryo selection.
I don't see any feasible way that gene editing or 'mind uploading' could work within the next few decades. Gene editing for intelligence seems unfeasible because human intelligence is a massively polygenic trait, influenced by thousands to tens of thousands of quantitative trait loci. Gene editing can fix major mutations, to nudge IQ back up to normal levels, but we don't know of any single genes that can boost IQ above the no...
Gene editing can fix major mutations, to nudge IQ back up to normal levels, but we don't know of any single genes that can boost IQ above the normal range
This is not true. We know of enough IQ variants TODAY to raise it by about 30 points in embryos (and probably much less in adults). But we could fix that by simply collecting more data from people who have already been genotyped.
None of them individually have a huge effect, but that doesn’t matter much. It just means you need to perform more edits.
If we want safe AI, we have to slow AI development.
I agree...
Tamsin -- interesting points.
I think it's important for the 'Pause AI' movement (which I support) to help politicians, voter, and policy wonks understand that 'power to do good' is not necessarily correlated with 'power to deter harm' or the 'power to do indiscriminate harm'. So, advocating for caution ('OMG AI is really dangerous!') should not be read as 'power to do good' or 'power to deter harm' -- which could incentivize gov'ts to pursue AI despite the risks.
For example, nuclear weapons can't really do much good (except maybe for blasting incomin...
gwern - The situation is indeed quite asymmetric, insofar as some people at Lightcone seem to have launched a poorly-researched slander attack on another EA organization, Nonlinear, which has been suffering serious reputational harm as a result. Whereas Nonlinear did not attack Lightcone or its people, except insofar as necessary to defend themselves.
Treating Nonlinear as a disposable organization, and treating its leaders as having disposable careers, seems ethically very bad.
Naive question: why are the disgruntled ex-employees who seem to have made many serious false allegations the only ones whose 'privacy' is being protected here?
The people who were accused at Nonlinear aren't able to keep their privacy.
The guy (Ben Pace) who published the allegations isn't keeping his privacy.
But the people who are at the heart of the whole controversy, whose allegations are the whole thing we've been discussing at length, are protected by the forum moderators? Why?
This is a genuine question. I don't understand the ethical or rational principles that you're applying here.
There's a big difference between arguing that someone shouldn't be able to stay anonymous, and unilaterally posting names. Arguing against allowing anonymity (without posting names) would not have been against the rules. But, we're definitely not going to re-derive the philosophy of when anonymity should and shouldn't be allowed, after names are already posted. The time to argue for an exception was beforehand, not after the fact.
We can talk about under what conditions revealing the identities of people who've made false accusations is appropriate, and about whether that accurately describes anything Alice and/or Chloe have done. But jumping straight to deanonymizing is seriously premature.
There's a human cognitive bias that may be relevant to this whole discussion, but that may not be widely appreciated in Rationalist circles yet: gender bias in 'moral typecasting'.
In a 2020 paper, my U. New Mexico colleague Tania Reynolds and coauthors found a systematic bias for women to be more easily categorized as victims and men as perpetrators, in situations where harm seems to have been done. The ran six studies in four countries (total N=3,317).
(Ever since a seminal paper by Gray & Wegner (2009), there's been a fast-growing literature on ...
Whatever people think about this particular reply by Nonlinear, I hope it's clear to most EAs that Ben Pace could have done a much better job fact-checking his allegations against Nonlinear, and in getting their side of the story.
In my comment on Ben Pace's original post 3 months ago, I argued that EAs & Rationalists are not typically trained as investigative journalists, and we should be very careful when we try to do investigative journalism -- an epistemically and ethically very complex and challenging profession, which typically requires years of t...
Ehhh, I don't really agree. To me the load bearing facts seem about right. I empathise that not much was to be gained by painstakingly discussing every detail. Pace was pretty clear how much was hearsay. I would have liked more room for response until the suing threat came. At that point, yeah, publish.
I'm actually quite confused by the content and tone of this post.
Is it a satire of the 'AI ethics' position?
I speculate that the downvotes might reflect other people being confused as well?
Fair enough. Thanks for replying. It's helpful to have a little more background on Ben. (I might write more, but I'm busy with a newborn baby here...)
Jim - I didn't claim that libel law solves all problems in holding people to higher epistemic standards.
Often, it can be helpful just to incentivize avoiding the most egregious forms of lying and bias -- e.g. punishing situations when 'the writer had actual knowledge that the claims were false, or was completely indifferent to whether they were true or false'.
Rob - you claim 'it's very obvious that Ben is neither deliberately asserting falsehoods, nor publishing "with reckless disregard'.
Why do you think that's obvious? We don't know the facts of the matter. We don't know what information he gathered. We don't know the contents of the interviews he did. As far as we can tell, there was no independent editing, fact-checking, or oversight in this writing process. He's just a guy who hasn't been trained as an investigative journalist, who did some investigative journalism-type research, and wrote it up.
Number of h...
Why do you think that's obvious?
I know Ben, I've conversed with him a number of times in the past and seen lots of his LW comments, and I have a very strong and confident sense of his priorities and values. I also read the post, which "shows its work" to such a degree that Ben would need to be unusually evil and deceptive in order for this post to be an act of deception.
I don't have any private knowledge about Nonlinear or about Ben's investigation, but I'm happy to vouch for Ben, such that if he turns out to have been lying, I ought to take a credibility ...
(Note: this was cross-posted to EA Forum here; I've corrected a couple of minor typos, and swapping out 'EA Forum' for 'LessWrong' where appropriate)
A note on EA LessWrong posts as (amateur) investigative journalism:
When passions are running high, it can be helpful to take a step back and assess what's going on here a little more objectively.
There are all different kinds of EA Forum LessWrong posts that we evaluate using different criteria. Some posts announce new funding opportunities; we evaluate these in terms of brevity, clarity, relevance, and useful ...
A brief note on defamation law:
The whole point of having laws against defamation, whether libel (written defamation) or slander (spoken defamation), is to hold people to higher epistemic standards when they communicate very negative things about people or organizations -- especially negative things that would stick in the readers/listeners minds in ways that would be very hard for subsequent corrections or clarifications to counter-act.
Without making any comment about the accuracy or inaccuracy of this post, I would just point out that nobody in EA should ...
(Copying my response from the EA Forum)
I agree there are some circumstances under which libel suits are justified, but the net-effect on the availability of libel suits strikes me as extremely negative for communities like ours, and I think it's very reasonable to have very strong norms against threatening or going through with these kinds of suits. Just because an option is legally available, doesn't mean that a community has to be fine with that option being pursued.
...That is the whole point and function of defamation law: to promote especially high standa
What you described was perhaps the intent behind the law, but that's not necessarily how it is used in practice. You can use the law to intimidate people who have less money than you, simply by giving the money to a lawyer... and then the other side needs to spend about the same money on their lawyer... or risk losing the case. "The process is the punishment."
(I have recently contributed money to a defense fund of a woman who exposed a certain criminal organization in my country. The organization was disbanded, a few members were convicted, one of them end...
The whole point of having laws against defamation, whether libel (written defamation) or slander (spoken defamation), is to hold people to higher epistemic standards when they communicate very negative things about people or organizations
This might be true of some other country's laws against defamation, but it is not true of defamation law in the US. Under US law, merely being wrong, sloppy, and bad at reasoning would not be sufficient to make something count as defamation; it only counts if the writer had actual knowledge that the claims were false, or was completely indifferent to whether they were true or false.
Without making any comment about the accuracy or inaccuracy of this post, I would just point out that nobody in EA should be shocked that an organization (e.g. Nonlinear) that is being libeled (in its view) would threaten a libel suit to deter the false accusations (as they see them), to nudge the author(e.g. Ben Pace) towards making sure that their negative claims are factually correct and contextually fair.
Wikipedia claims: "The 1964 case New York Times Co. v. Sullivan, however, radically changed the nature of libel law in the United States by esta...
Gordon - I was also puzzled by the initial downvotes. But they happened so quickly that I figured the downvoters hadn't actually read or digested my essay. Disappointing that this happens on LessWrong, but here we are.
Max - I think your observations are right. The 'normies', once they understand AI extinction risk, tend to have much clearer, more decisive, more negative moral reactions to AI than many EAs, rationalists, and technophiles tend to have. (We've been conditioned by our EA/Rat subcultures to think we need to 'play nice' with the AI industry, no matter how sociopathic it proves to be.)
Whether a moral anti-AI backlash can actually slow AI progress is the Big Question. I think so, but my epistemic confidence on this issue is pretty wide. As an evolutionary...
Maybe. But at the moment, the US is really the only significant actor in the AGI development space. Other nations are reacting in various ways, ranging from curious concern to geopolitical horror. But if we want to minimize risk of a nation-state AI arms races, the burden is on the US companies to Just Stop Unilaterally Driving The Arms Race.
I'm predicting that an anti-AI backlash is likely, given human moral psychology and the likely applications of AI over the next few years.
In further essays I'm working on, I'll probably end up arguing that an anti-AI backlash may be a good strategy for reducing AI extinction risk -- probably much faster, more effective, and more globally applicable than any formal regulatory regime or AI safety tactics that the AI industry is willing to adopt.
Well, the AI industry and the pro-AI accelerationists believe that there is an 'immense upside of AGI', but that is a highly speculative, faith-based claim, IMHO. (The case for narrow AI having clear upsides is much stronger, I think.)
It's worth noting that almost every R&D field that has been morally stigmatized -- such as intelligence research, evolutionary psychology, and behavior genetics -- also offered huge and transformative upsides to society, when the field first developed. Until they got crushed by political demonization, and their potential ...
I don't think so. My friend Peter Todd's email addresses typically include his middle initial 'm'.
Puzzling.
mwatkins - thanks for a fascinating, detailed post.
This is all very weird and concerning. As it happens, my best friend since grad school is Peter Todd, professor of cognitive science, psychology, & informatics at Indiana University. We used to publish a fair amount on neural networks and genetic algorithms back in the 90s.
https://psych.indiana.edu/directory/faculty/todd-peter.html
That's somewhat helpful.
I think we're coming at this issue from different angles -- I'm taking a very evolutionary-functional view focused on what selection pressures shape psychological adaptations, what environmental information those adaptations need to track (e.g. snake! or pathogen!), what they need to represent about the world (e.g. imminent danger of death from threat X!), and what behaviors they need to trigger (e.g. run away!).
From that evolutionary-functional view, the 'high-level cognitive properties' of 'fitness affordances' are the...
If we're dead-serious about infohazards, we can't just be thinking in terms of 'information that might accidentally become known to others through naive LessWrong newbies sharing it on Twitter'.
Rather, we need to be thinking in terms of 'how could we actually prevent the military intelligence analysts of rival superpowers from being able to access this information'?
My personal hunch is that there are very few ways we could set up sites, security protocols, and vetting methods that would be sufficient to prevent access by a determined government. Whic...
Bluntly: if you write it on Lesswrong or the Alignment Forum, or send it to a particular known person, governments will get a copy if they care to. Cybersecurity against state actors is really, really, really hard. Lesswrong is not capable of state-level cyberdefense.
If you must write it at all: do so with hardware which has been rendered physically unable to connect to the internet, and distribute only on paper, discussing only in areas without microphones. Consider authoring only on paper in the first place. Note that physical compromise of your home, w...
If we're nowhere close to solving alignment well enough that even a coarse-grained description of actual human values is relevant yet, then I don't understand why anyone is advocating further AI research at this point.
Also, 'avoiding deceptive alignment' doesn't really mean anything if we don't have a relatively rich and detailed description of what 'authentic alignment' with human values would look like.
I'm truly puzzled by the resistance that the AI alignment community has against learning a bit more about the human values we're allegedly aligning with.
GeneSmith -- I guess I'm still puzzled about how Shard Theory prevents wireheading (broadly construed); I just don't see it as a magic bullet that can keep agents focused on their ultimate goals. I must be missing something.
And, insofar as Shard Theory is supposed to be an empirically accurate description of human agents, it would need to explain why some people become fentanyl addicts who might eventually overdose, and others don't. Or why some people pursue credentials and careers at the cost of staying childless... while others settle down young, have s...
Akash -- this is very helpful; thanks for compiling it!
I'm struck that much of the advice for newbies interested in 'AI alignment with human values' is focused very heavily on the 'AI' side of alignment, and not on the 'human values' side of alignment -- despite the fact that many behavioral and social sciences have been studying human values for many decades.
It might be helpful to expand lists like these to include recommended papers, books, blogs, videos, etc that can help alignment newbies develop a more sophisticated understanding of the human psycholo...
GeneSmith -- when people in AI alignment or LessWrong talk about 'wireheading', I understood that not to refer to people literally asking neurosurgeons to stick wires into their brains, but rather to a somewhat larger class of ways to hack one's own reward systems through the usual perceptual input channels.
I agree that humans are not 'reward maximizing agents', whatever that is supposed to mean in reference to actual evolved organisms with diverse, heterogenous, & domain-specific motivational systems.
Quintin (and also Alex) - first, let me say, thank you for the friendly, collegial, and constructive comments and replies you've offered. Many folks get reactive and defensive when they're hit with a 6,000-word critique of their theory, but you're remained constructive and intellectually engaged. So, thanks for that.
On the general point about Shard Theory being a relatively 'Blank Slate' account, it might help to think about two different meanings of 'Blank Slate' -- mechanistic versus functional.
A mechanistic Blank Slate approach (which I take Shard Theor...
TurnTrout -- I think the 'either/or' framing here is misleading about the way that genomes can adapt to maximize survival and minimize death.
For example, jumping spiders have evolved special secondary eyes pointing backwards that specifically detect predators from behind that might want to eat them. At the functional level of minimizing death, these eyes 'hardcode death-fear' in a very real and morphological way. Similarly, many animals vulnerable to predators evolve eye locations on the sides of their heads, to maximize degrees of visual coverage they can...
Jan - well said, and I strongly agree with your perspective here.
Any theory of human values should also be consistent with the deep evolutionary history of the adaptive origins and functions of values in general - from the earliest Cambrian animals with complex nervous systems through vertebrates, social primates, and prehistoric hominids.
As William James pointed out in 1890 (paraphrasing here), human intelligence depends on humans have more evolved instincts, preferences, and values than other animals, not having fewer.
For what it's worth, I wrote a critique of Shard Theory here on LessWrong (on Oct 20, 2022) from the perspective of behavior genetics and the heritability of values.
The comments include some helpful replies and discussions with Shard Theory developers Quintin Pope and Alex Trout.
I'd welcome any other feedback as well.
Quintin -- yes, indeed, one of the reasons I was excited about Shard Theory was that it has these different emphases you mention (e.g. 'multi-optimizer dynamics, values handshakes among shards, origins of self-reflective modeling, origins of biases, moral reflection as shard deliberation'), which I thought might actually be useful to develop and integrate with in evolutionary psychology and other branches of psychology, not just in AI alignment.
So I wanted to see if Shard Theory could be made a little more consistent with behavior genetics and ev psy...
Quintin & Alex - this is a very tricky issue that's been discussed in evolutionary psychology since the late 1980s.
Way back then, Leda Cosmides & John Tooby pointed out that the human genome will 'offload' any information it can that's needed for brain development onto any environmental regularities that can be expected to be available externally, out in the world. For example, the genome doesn't need to specify everything about time, space, and causality that might be relevant in reliably building a brain that can do intuitive physics -- as ...
GeneSmith -- thanks for your comment. I'll need to think about some of your questions a bit more before replying.
But one idea popped out to me: the idea that shard theory offers 'a good explanation of how humans were able to avoid wireheading.'
I don't understand this claim on two levels:
PS, Gary Marcus at NYU makes some related points about Blank Slate psychology being embraced a bit too uncritically by certain strands of thinking in AI research and AI safety.
His essay '5 myths about learning and innateness'
His essay 'The new science of alt intelligence'
His 2017 debate with AI researcher Yann LeCunn 'Does AI need more innate machinery'
I don't agree with Gary Marcus about everything, but I think his views are worth a bit more attention from AI alignment thinkers.
tailcalled -- these issues of variance, canalization, quality control, etc are very interesting.
For example, it's very difficult to understand why so many human mental disorders are common, heritable, and harmful -- why wouldn't the genetic variants that cause schizophrenia or major depression already have been eliminated by selection? Our BBS target article in 2006 addressed this.
Conversely, it's a bit puzzling that the coefficient of additive genetic variation in human brain size is lower than might be expected, according to our 2007 meta-analysis.
In gen...
Jacob - thanks! Glad you found that article interesting. Much appreciated. I'll read the links essays when I can.
It's hard to know how to respond to this comment, which reveals some fundamental misunderstandings of heritability and of behavior genetics methods. The LessWrong protocol is 'If you disagree, try getting curious about what your partner is thinking'. But in some cases, people unfamiliar with a field have the same old misconceptions about the field, repeated over and over. So I'm honestly having trouble arousing my curiosity....
The quote from habryka doesn't make sense to me, and doesn't seem to understand how behavior genetic studies estimate heritabilitie...
Jacob, I'm having trouble reconciling your view of brains as 'Universal Learning Machines' (and almost everything being culturally transmitted), with the fact that millions of other animals species show exactly the kinds of domain-specific adaptive responses studied in evolutionary biology, animal behavior research, and evolutionary psychology.
Why would 'fear of death' be 'culturally transmitted' in humans, when thousands of other vertebrate species show many complex psychological and physiological adaptations to avoid accidents, starvation, parasitism, an...
You seem to be making some very sweeping claims about heritability here. In what sense is 'heritability not what I think'?
Do you seriously think that moderate heritability doesn't say anything at all about how much genes matter, versus how much 'non-agentic things can influence a trait'?
My phrasing was slightly tongue-in-cheek; I agree that sex hormones, hormone receptors in the brain, and the genomic regulatory elements that they activate, have pervasive effects on brain development and psychological sex differences.
Off topic: yes, I'm familiar with evolutionary game theory; I was senior research fellow in an evolutionary game theory center at University College London 1996 - 2000, and game theory strongly influenced my thinking about sexual selection and social signaling.
Steven -- thanks very much for your long, thoughtful, and constructive comment. I really appreciate it, and it does help to clear up a few of my puzzlements about Shard Theory (but not all of them!).
Let me ruminate on your comment, and read your linked essays.
I have been thinking about how evolution can implement different kinds of neural architectures, with different degrees of specificity versus generality, ever since my first paper in 1989 on using genetic algorithms to evolve neural networks. Our 1994 paper on using genetic algorithms to evolve sensori...
Jacob - I read your 2015 essay. It is interesting and makes some fruitful points.
I am puzzled, though, about when nervous systems are supposed to have evolved this 'Universal Learning Machine' (ULM) capability. Did ULMs emerge with the transition from invertebrates to vertebrates? From rat-like mammals to social primates? From apes with 400 cc brains to early humans with 1100 cc brains?
Presumably bumblebees (1 million neurons) don't have ULM capabilities, but humans (80 billion neurons) allegedly do. Where is the threshold between them -- given that bumble...
Charlie - thanks for offering a little more 'origin story' insight into Shard Theory, and for trying to explain what Quintin Trout was trying to express in that passage.
Honestly, I still don't get it. The 'developmental recipe' that maps from genotype to phenotype, for any complex adaptation, is usually opaque, complicated, uninterpretable, and full of complex feedback loops, regulatory systems, and quality control systems. These are typically beyond all human comprehension, because there were never any evolutionary selection pressures for that development...
Jacob -- thanks for your comment. It offers an interesting hypothesis about some analogies between human brain systems and computer stuff.
Obviously, there's not enough information in the human genome to specify every detail of every synaptic connection. Nobody is claiming that the genome codes for that level of detail. Just as nobody would claim that the genome specifies every position for every cell in a human heart, spine, liver, or lymphatic system.
I would strongly dispute that it's the job of 'behavior genetics, psychology, etc' to fit thei...
Peter -- I think 'hard coding' and 'hard wiring' is a very misleading way to think about brain evolution and development; it's based way too much on the hardware/software distinction in computer science, and on 1970s/1980s cognitive science models inspired by computer science.
Apparently it's common in some AI alignment circles to view the limbic system as 'hard wired', and the neocortex as randomly initialized? Interesting if true. But I haven't met any behavior geneticists, neuroscientists, evolutionary psychologists, or developmental psychologists who wo...
I haven't read the universal learning hypothesis essay (2015) yet, but at first glance, it also looks vulnerable to a behavior genetic critique (and probably an evolutionary psychology critique as well).
In my view, evolved predispositions shape many aspects of learning, including Bayesian priors about how the world is likely to work, expectations about how contingencies work (e.g. the Garcia Effect that animals learn food aversions more strongly if the lag between food intake and nausea/distress is a few minutes/hours rather than immediate), domain-specifi...
Here's the thing, just_browsing.
Some people want to stop human extinction from unaligned Artificial Superintelligence that's developed by young men consumed by reckless, misanthropic hubris -- by using whatever persuasion and influence techniques actually work on most people.
Other people want to police 'vibes' and 'cringe' on social media, and feel morally superior to effective communicators.
Kat Woods is the former.
If you have evidence her communication strategy works, you are of course welcome to provide it. (Also, "using whatever communication strategy actually works" is not necessarily a good thing to do! Lying, for example, works very well on most people, and yet it would be bad to promote AI safety with a campaign of lies).