What happened to black swan and tail risk robustness (section 2.1 in "Unsolved Problems in ML Safety")?
It's hard to say. This CLR article lists some advantages that artificial systems have over humans. Also see this section of 80k's interview with Richard Ngo:
...Rob Wiblin: One other thing I’ve heard, that I’m not sure what the implication is: signals in the human brain — just because of limitations and the engineering of neurons and synapses and so on — tend to move pretty slowly through space, much less than the speed of electrons moving down a wire. So in a sense, our signal propagation is quite gradual and our reaction times are really slow compared to wha
The cyborgism post might be relevant:
...Executive summary: This post proposes a strategy for safely accelerating alignment research. The plan is to set up human-in-the-loop systems which empower human agency rather than outsource it, and to use those systems to differentially accelerate progress on alignment.
- Introduction: An explanation of the context and motivation for this agenda.
- Automated Research Assistants: A discussion of why the paradigm of training AI systems to behave as autonomous agents is both counterproductive and dangerous.
- Becoming a Cybor
I've grown increasingly alarmed and disappointed by the number of highly-upvoted and well-received posts on AI, alignment, and the nature of intelligent systems, which seem fundamentally confused about certain things.
Can you elaborate on how all these linked pieces are "fundamentally confused"? I'd like to see a detailed list of your objections. It's probably best to make a separate post for each one.
I believe that Marcus' point is that there are classes of problems that tend to be hard for LLMs (biological reasoning, physical reasoning, social reasoning, practical reasoning, object and individual tracking, nonsequiturs). The argument is that problems in these class will continue to hard.
Yeah this is the part that seems increasingly implausible to me. If there is a "class of problems that tend to be hard ... [and] will continue to be hard," then someone should be able to build a benchmark that models consistently struggle with over time.
Oh I see; I read too quickly. I interpreted your statement as "Anthropic clearly couldn't care less about shortening timelines," and I wanted to show that the interpretability team seems to care.
Especially since this post is about capabilities externalities from interpretability research, and your statement introduces Anthropic as "Anthropic, which is currently the biggest publisher of interp-research." Some readers might conclude corollaries like "Anthropic's interpretability team doesn't care about advancing capabilities."
Ezra Klein listed some ideas (I've added some bold):
...The first is the question — and it is a question — of interpretability. As I said above, it’s not clear that interpretability is achievable. But without it, we will be turning more and more of our society over to algorithms we do not understand. If you told me you were building a next generation nuclear power plant, but there was no way to get accurate readings on whether the reactor core was going to blow up, I’d say you shouldn’t build it. Is A.I. like that power plant? I’m not sure. But that’s a questi
Anthropic, which is currently the biggest publisher of interp-research, clearly does not have a commitment to not work towards advancing capabilities
This statement seems false based on this comment from Chris Olah.
Here's a list of arguments for AI safety being less important, although some of them are not object-level.
The model knows it’s being trained to do something out of line with its goals during training and plays along temporarily so it can defect later. That implies that differential adversarial examples exist in training.
I don't think this implication is deductively valid; I don't think the premise entails the conclusion. Can you elaborate?
I think this post's argument relies on that conclusion, along with an additional assumption that seems questionable: that it's fairly easy to build an adversarial training setup that distinguishes the design objective from al...
Some comments:
A large amount of the public thinks AGI is near.
This links to a poll of Lex Fridman's Twitter followers, which doesn't seem like a representative sample of the US population.
they jointly support a greater than 10% likelihood that we will develop broadly human-level AI systems within the next decade.
Is this what you're arguing for when you say "short AI timelines"? I think that's a fairly common view among people who think about AI timelines.
AI is starting to be used to accelerate AI research.
My sense is that Copilot is by far ...
From maybe 2013 to 2016, DeepMind was at the forefront of hype around AGI. Since then, they've done less hype.
I'm confused about the evidence for these claims. What are some categories of hype-producing actions that DeepMind did between 2013 and 2016 and hasn't done since? Or just examples.
One example is the AlphaGo documentary -- DeepMind has not made any other documentaries about their results. Another related example is "playing your Go engine against the top Go player in a heavily publicized event."
...In the wake of big public releases like ChatGPT and Sy
I would imagine that first, the AGI must be able to create a growing energy supply and a robotic army capable of maintaining and extending this supply. This will require months or years of having humans help produce raw materials and the factories for materials, maintenance robots and energy systems.
An AGI might be able to do these tasks without human help. Or it might be able to coerce humans into doing these tasks.
...Third, assuming the AGI used us to build the energy sources, robot armies, and craft to help them leave this planet, (or build this themselves
Imagine you are the CEO of OpenAI, and your team has finished building a new, state-of-the-art AI model. You can:
Do you think (1) is riskier than (2)? I think the answer depends heavily on the details of the test.
On a related note, this part might be misleading:
I’m just really, really skeptical that a bunch of abstract work on decision theory and similar [from MIRI and similar independent researchers] will get us there. My expectation is that alignment is an ML problem, and you can’t solve alignment utterly disconnected from actual ML systems.
I think earlier forms of this research focused on developing new, alignable algorithms, rather than aligning existing deep learning algorithms. However, a reader of the first quote might think "wow, those people actually thoug...
In my opinion it was the right call to spend this amount of funding on the office for the last ~6 months of its existence even when we thought we'd likely do something quite different afterwards
This is confusing to me. Why not do "something quite different" from the start?
I'm trying to point at opportunity costs more than "gee, that's a lot of money, the outcome had better be good!" There are many other uses for that money besides the Lightcone offices.
...A smart, competent, charismatic, person with horrible ethics will enter the office because they've manage
I'm confused. Here's a conversation I just had with GPT-4, with prompts in italics:
...a, b, c, d, e, f, g. What is the third item in the list?
The third item in the list is c.
What is the third word of this sentence?
To accurately identify the third word of the sentence, I need the sentence you are referring to. Please provide the sentence in question.
The sentence is "What is the third word of this sentence?" See what I did there?
Yes, I see what you did there. The third word of the sentence "What is the third word of this sentence?" is "the".
What is the third le
I pasted the YouTube video link into AssemblyAI's Playground (which I think uses Conformer-1 for speech to text) and generated a transcript, available at this link. However, the transcript lacks labels for who is speaking.
I asked GPT-4 to summarize the article and then come up with some alternative terms, here are a few I like:
I also asked for some idioms. "Seeing the forest but not the trees" seems apt.
Brain computation speed is constrained by upper neuron firing rates of around 1 khz and axon propagation velocity of up to 100 m/s [43], which are both about a million times slower than current computer clock rates of near 1 Ghz and wire propagation velocity at roughly half the speed of light.
Can you provide some citations for these claims? At the moment the only citation is a link to a Wikipedia article about nerve conduction velocity.
I greatly appreciate this post. I feel like "argh yeah it's really hard to guarantee that actions won't have huge negative consequences, and plenty of popular actions might actually be really bad, and the road to hell is paved with good intentions." With that being said, I have some comments to consider.
The offices cost $70k/month on rent [1], and around $35k/month on food and drink, and ~$5k/month on contractor time for the office. It also costs core Lightcone staff time which I'd guess at around $75k/year.
That is ~$185k/month and ~$2.22m/year. I won...
To be clear, I haven't seen many designs that people I respect believed to have a chance of actually working. If you work on the alignment problem or at an AI lab and haven't read Nate Soares' On how various plans miss the hard bits of the alignment challenge, I'd suggest reading it.
Can you explain your definition of the sharp left turn and why it will cause many plans to fail?
At the time of me writing, this comment is still the most recommended comment with 910 recommendations. 2nd place has 877 recommendations:
Never has a technology been potentially more transformative and less desired or asked for by the public.
3rd place has 790 recommendations:
...“A.I. is probably the most important thing humanity has ever worked on. I think of it as something more profound than electricity or fire.”
Sundar Pichai’s comment beautifully sums up the arrogance and grandiosity pervasive in the entire tech industry—the notion that building machines t
Does anyone have thoughts on Justin Sung? He has a popular video criticizing active recall and spaced repetition. The argument: if you use better strategies for initially encountering an idea and storing it in long-term memory, then the corresponding forgetting curve will exhibit a more gradual decline, and you won't need to use flashcards as frequently.
I see some red flags about Justin:
Microsoft is currently the 2# largest company on earth and is valued at almost 2 Trillion.
What does "largest" mean? By revenue, Microsoft is 33rd (according to Wikipedia).
EDIT: I'm guessing you mean 2nd largest public corporation based on market capitalization.
That makes sense. My main question is: where is the clear evidence of human negligibility in chess? People seem to be misleadingly confident about this proposition (in general; I'm not targeting your post).
When a friend showed me the linked post, I thought "oh wow that really exposes some flaws in my thinking surrounding humans in chess." I believe some of these flaws came from hearing assertive statements from other people on this topic. As an example, here's Sam Harris during his interview with Eliezer Yudkowsky (transcript, audio):
...Obviously we’ll be get
AIs overtake humans. Humans become obsolete and their contribution is negligible to negative.
I'm confused why chess is listed as an example here. This StackExchange post suggests that cyborg teams are still better than chess engines. Overall, I'm struggling to find evidence for or against this claim (that humans are obsolete in chess), even though it's a pretty common point in discussions about AI.
I think it's worth noting Joe Carlsmith's thoughts on this post, available starting on page 7 of Kokotajlo's review of Carlsmith's power-seeking AI report (see this EA Forum post for other reviews).
...JC: I do think that the question of how much probability mass you concentrate on APS-AI by 2030 is helpful to bring out – it’s something I’d like to think more about (timelines wasn’t my focus in this report’s investigation), and I appreciate your pushing the consideration.
I read over your post on +12 OOMs, and thought a bit about your argument here. One b
Relevant tweet/quote from Mustafa Suleyman, the co-founder and CEO:
Suleyman's statements are either very specific capabilities predictions or incredibly vague statements like the one you brought up that don't really inform us much. His interviews often revolve around talking about how big and smart their future models will be while also spending time putting in a good word for their financial backers (mainly NVIDIA). I find myself frustrated at seeing this company with a lot of compute and potential impact on timelines, but whose CEO and main spokesperson seems very out-of-touch with the domain he does business in.