All of jeffreycaruso's Comments + Replies

Fired from OpenAI's Superalignment team, Aschenbrenner now runs a fund dedicated to funding AGI-focused startups, according to The Information. 

"Former OpenAI super-alignment researcher Leopold Aschenbrenner, who was fired from the company for allegedly leaking information, has started an investment firm to back startups with capital from former Github CEO Nat Friedman, investor Daniel Gross, Stripe CEO Patrick Collision and Stripe president John Collision, according to his personal website.

In a recent podcast interview, Aschenbrenner spoke about the ... (read more)

Leopold's interview with Dwarkesh is a very useful source of what's going on in his mind.

What happened to his concerns over safety, I wonder?

He doesn't believe in a 'sharp left turn', which means he doesn't consider general intelligence to be a discontinuous (latent) capability spike such that alignment becomes significantly more difficult after it occurs. To him, alignment is simply a somewhat harder empirical techniques problem like capabilities work is. I assume he imagines in behavior similar to current RLHF-ed models even as frontier labs have dou... (read more)

Your example of the janitor interrupting the scientist is a good demonstration of my point. I've organized over a hundred cybersecurity events featuring over a thousand speakers and I've never had a single janitor interrupt a talk. On the other hand, I've had numerous "experts" attempt to pass off fiction as fact, draw assumptions from faulty data, and generally behave far worse than any janitor might due to their inflated egos. 

Based on my conversations with computer science and philosophy professors who aren't EA-affiliated, and several who are, the... (read more)

1Amalthea
I'm echoing other commenters somewhat, but - personally - I do not see people being down-voted simply for having different viewpoints. I'm very sympathetic to people trying to genuinely argue against "prevailing" attitudes or simply trying to foster a better general understanding. (E.g. I appreciate Matthew Barnett's presence, even though I very much disagree with his conclusions and find him overconfident). Now, of course, the fact that I don't notice the kind of posts you say are being down-voted may be because they are sufficiently filtered out, which indeed would be undesirable from my perspective and good to know.
3the gears to ascension
can't comment on moderators, since I'm not one, but I'd be curious to see links you think were received worse than is justified and see if I can learn from them

I think you're too close to see objectively. I haven't observed any room for policy discussions in this forum that stray from what is acceptable to the mods and active participants. If a discussion doesn't allow for opposing viewpoints, it's of little value. In my experience, and from what I've heard from others who've tried posting here and quit, you have not succeeded in making this a forum where people with opposing viewpoints feel welcome.

3gilch
You are not wrong to complain. That's feedback. But this feels too vague to be actionable. First, we may agree on more than you think. Yes, groupthink can be a problem, and gets worse over time, if not actively countered. True scientists are heretics. But if the science symposium allows the janitor to interrupt the speakers and take all day pontificating about his crackpot perpetual motion machine, it's also of little value. It gets worse if we then allow the conspiracy theorists to feed off of each other. Experts need a protected space to converse, or we're stuck at the lowest common denominator (incoherent yelling, eventually). We unapologetically do not want trolls to feel welcome here. Can you accept that the other extreme is bad? I'm not trying to motte-and-bailey you, but moderation is hard. The virtue lies between the extremes, but not always exactly in the center. What I want from LessWrong is high epistemic standards. That's compatible with opposing viewpoints, but only when they try to meet our standards, not when they're making obvious mistakes in reasoning. Some of our highest-karma posts have been opposing views! Do you have concrete examples? In each of those cases, are you confident it's because of the opposing view, or could it be their low standards?

Have you read this? https://www.politico.eu/article/rishi-sunak-ai-testing-tech-ai-safety-institute/

"“You can’t have these AI companies jumping through hoops in each and every single different jurisdiction, and from our point of view of course our principal relationship is with the U.S. AI Safety Institute,” Meta’s president of global affairs Nick Clegg — a former British deputy prime minister — told POLITICO on the sidelines of an event in London this month."

"OpenAI and Meta are set to roll out their next batch of AI models imminently. Yet neither has gra... (read more)

RobertM1713

I hadn't, but I just did and nothing in the article seems to be responsive to what I wrote.

Amusingly, not a single news source I found reporting on the subject has managed to link to the "plan" that the involved parties (countries, companies, etc) agreed to.

Nothing in that summary affirmatively indicates that companies agreed to submit their future models to pre-deployment testing by the UK AISI.  One might even say that it seems carefully worded to avoid explicitly pinning the companies down like that.

Yes, I like it! Thanks for sharing that analysis, Gunnar.

Good list. I think I'd use a triangle to organize them. Have consciousness at the base, then sentience, then drawing from your list, phenomenal consciousness, followed by Intentionality? 

2Gunnar_Zarncke
I see it as a hierarchy that results from lower to high degree of processing and resulting abstractions.   Sentience is simple hard-wired behavioral responses to pleasure or pain stimuli and physiological measures.  Wakefulness involves more complex processing such that diurnal or sleep/wake patterns are possible (requires at least two levels).  Intentionality means systematic pursuing of desires. That requires yet another level of processing: Different patterns of behaviors for different desires at different times and their optimization.  Phenomenal Consciousness is then the representation of the desire in a linguistic or otherwise communicable form, which is again one level higher. Self-Consciousness includes the awareness of this process going on. Meta-Consciousness is then the analysis of this whole stack. See also https://wiki.c2.com/?LeibnizianDefinitionOfConsciousness

Thank you for asking. 

To generalize across disciplines, a critical aspect of human-level artificial intelligence, requires the ability to observe and compare. This is a feature of sentience. All sentient beings are conscious of their existence. Non-sentient conscious beings exist, of course, but none who could pass a Turing test or a Coffee-making test. That requires both sentience and consciousness.

4Gunnar_Zarncke
Sentience is one facet of consciousness, but it is not the only one and plausibly not the one responsible for "observe and compare", which requires high cognitive function. See my list of facets here:  https://www.lesswrong.com/posts/8szBqBMqGJApFFsew/gunnar_zarncke-s-shortform#W8XBDmjvbhzszEnrJ 

What happens if you shut down power to the AWS or Azure console powering the Foundation model? Wouldn't this be the easiest way to test various hypotheses associated with the Shutdown Problem in order to either verify it or reject it as a problem not worth sinking further resources into?

That's a good example of my point. Instead of a petition, a more impactful document would be a survey of risks and their probability of occurring in the opinion of these notable public figures. 

In addition, there should be a disclaimer regarding who has accepted money from Open Philanthropy or any other EA-affiliated non-profit for research. 

Which makes it an existential risk. 

"An existential risk is any risk that has the potential to eliminate all of humanity or, at the very least, kill large swaths of the global population." - FLI

What aspect of AI risk is deemed existential by these signatories? I doubt that they all agree on that point. Your publication "An Overview of Catastrophic AI Risks" lists quite a few but doesn't differentiate between theoretical and actual. 

Perhaps if you were to create a spreadsheet with a list of each of the risks mentioned in your paper but with the further identification of each as actual or theoretical, and ask each of those 300 luminaries to rate them in terms of probability, then you'd have something a lot more useful. 

2RHollerith
The statement does not mention existential risk, but rather "the risk of extinction from AI".

I looked at the paper you recommended Zack. The specific section having to do with "how" AGI is developed (para 1.2) skirts around the problem. 

"We assume that AGI is developed by pretraining a single large foundation model using selfsupervised learning on (possibly multi-modal) data [Bommasani et al., 2021], and then fine-tuning it using model-free reinforcement learning (RL) with a reward function learned from human feedback [Christiano et al., 2017] on a wide range of computer-based tasks.4 This setup combines elements of the techniques used to tra... (read more)

My apologies for not being clear in my Quick Take, Chris. As Zach pointed out in his reply, I posed two issues. 

The first being an obvious parallel for me between EA and Judeo-Christian religions. You may or may not agree with me, which is fine. I'm not looking to convince anyone of my point-of-view. I was merely interested in seeing if others here had a similar POV. 

The second issue I raised was what I saw as a failure in the reasoning chain where you go from Deep Learning to Consciousness to an AI Armageddon. Why was that leap in faith so compe... (read more)

2gilch
The argument chain you presented (Deep Learning -> Consciousness -> AI Armageddon) is a strawman. If you sincerely think that's our position, you haven't read enough. Read more, and you'll be better received. If you don't think that, stop being unfair about what we said, and you'll be better received. Last I checked, most of us were agnostic on the AI Consciousness question. If you think that's a key point to our Doom arguments, you haven't understood us; that step isn't necessarily required; it's not a link in the chain of argument. Maybe AI can be dangerous, even existentially so, without "having qualia". But neither are we confident that AI necessarily won't be conscious. We're not sure how it works in humans but seems to be an emergent property of brains, so why not artificial brains as well? We don't understand how the inscrutable matrices work either, so it seems like a possibility. Maybe gradient descent and evolution stumbled upon similar machinery for similar reasons. AI consciousness is mostly beside the point. Where it does come up is usually not in the AI Doom arguments, but questions about what we ethically owe AIs, as moral patients. Deep Learning is also not required for AI Doom. Doom is a disjunctive claim; there are multiple paths for getting there. The likely-looking path at this point would go through the frontier LLM paradigm, but that isn't required for Doom. (However, it probably is required for most short timelines.)
6Chris_Leong
Oh, they're definitely valid questions. The problem is that the second question is rather vague. You need to either state what a good answer would look like or why existing answers aren't satisifying.

Thank you for the link to that paper, Zack. That's not one that I've read yet. 

And you're correct that I raised two separate issues. I'm interested in hearing any responses that members of this community would like to give to either issue. 

9Viliam
Analogies can be found in many places. FDA prevents you from selling certain kinds of food? Sounds similar to ancient priests declaring food taboos for their followers. Vaccination? That's just modern people performing a ritual to literally protect them from invisible threats. They even believe that a bad thing will happen to them if someone else in their neighborhood refuses to perform the ritual properly. The difference is that we already have examples of food poisoning or people dying from a disease, but we do not have an example of a super-intelligent AI exterminating the humanity. That is a fair objection, but it is also clear why waiting to get the example first might be a wrong approach, so... One possible approach is to look at smaller versions. What is a smaller version of "a super-intelligent AI exterminating the humanity"? If it is "a stupid program doing things its authors clearly did not intend", then every software developer has stories to tell. This is not the full answer, of course, but I think that a reasonable debate should be more like this.
5Zack_M_Davis
I mean, I agree that there are psycho-sociological similarities between religions and the AI risk movement (and indeed, I sometimes pejoratively refer to the latter as a "robot cult"), but analyzing the properties of the social group that believes that AI is an extinction risk is a separate question from whether AI in fact poses an extinction risk, which one could call Armageddon. (You could spend vast amounts of money trying to persuade people of true things, or false things; the money doesn't care either way.) Obviously, there's not going to be a "proof" of things that haven't happened yet, but there's lots of informed speculation. Have you read, say, "The Alignment Problem from a Deep Learning Perspective"? (That may not be the best introduction for you, depending on the reasons for your skepticism, but it's the one that happened to come to mind, which is more grounded in real AI research than previous informed speculation that had less empirical data to work from.)
7Chris_Leong
I downvoted this post. I claim it's for the public good, maybe you find this strange, but let me explain my reasoning. You've come on Less Wrong, a website that probably has more discussion of this than any other website on the internet. If you want to find arguments, they aren't hard to find. It's a bit like walking into a library and saying that you can't find a book to read. The trouble isn't that you literally can't find any book/arguments, it's that you've got a bunch of unstated requirements that you want satisfied. Now that's perfectly fine, it's good to have standards. At the same time, you've asked the question in a maximally vague way. I don't expect you to be able to list all your requirements. That's probably impossible and when it is. possible, it's often a lot of work. At the same time, I do believe that it's possible to do better than maximally vague. The problem with maximally vague questions is that they almost guarantee that any attempt to provide an answer will be unsatisfying both for the person answering and the person receiving the answer. Worse, you've framed the question in such a way that some people will likely feel compelled to attempt to answer anyway, lest people who think that there is such a risk come off as unable to respond to critics. If that's the case, downvoting seems logical. Why support a game where no-one wins? Sorry if this comes off as harsh, that's not my intent. I'm simply attempting to prompt reflection.

Are there other forums for AI Alignment or AI Safety and Security besides this one where your article could be published for feedback from perspectives that haven't been shaped by Rationalist thinking or EA? 

It would be considerably more difficult, however hacking wasn't really the behavior that I had in mind. Metzinger's BAAN argument goes to the threat of human extinction so I was more curious regarding any research being done regarding how to shut an AI system down with no possibility of a reboot. 

I don't see the practical value of a post that starts off with conjecture rather than reality; i.e., "In a saner world...." 

You clearly wish that things were different, that investors and corporate executives would simply stop all progress until ironclad safety mechanisms were in place, but wishing doesn't make it so. 

Isn't the more pressing problem what can be done in the world that we have, rather than in a world that we wish we had? 

Technically, Harry didn't earn his wealth by defeating Voldemort. His mother earned it by giving her life to protect him. It was the sacrifice born of love that defeated the Killing Curse, one of the few ways it could be defeated. Perhaps that's an example of Fundamental Attribution Error. 

I just read the post that you linked to. He used the word "prediction" one time in the entire post so I'm having trouble understanding how that was mean't to be an answer to my question. Same with that it's a cornerstone of LessWrong, which, for me, is like asking a Christian why they believe in God, and they answer, because the Bible tells me so.

Is a belief a prediction? 

If yes, and a prediction is an act of forecasting, then there must be a way to know if your prediction was correct or incorrect.

Therefore, maybe one requirement for a belief is that ... (read more)

2Kaj_Sotala
While he doesn't explicitly use the word "prediction" that much in the post, he does talk about "anticipated experiences", which around here is taken to be synonymous with "predicted experiences".

Thanks, Trevor. I've bookmarked that link. Just yesterday I started creating a short list of terms for my readers so that link will come in handy. 

2trevor
@Raemon is the superintelligence FAQ helpful as a short list of terms for Caruso's readers?

I've never seen prediction used in reference to a belief before. I'm curious to hear how you arrived at the conclusion that a belief is a prediction. 

For me, a belief is something that I cannot prove but I suspect is true whereas a prediction is something that is based in research. Philip Tetlock's book Superforcasting is a good example. Belief has little to nothing to do with making accurate forecasts whereas confronting one's biases and conducting deep research are requirements. 

I think it's interesting how the word "belief" can simultaneously ... (read more)

5AnnaSalamon
I got this in part from Eliezer's post Make your beliefs pay rent in anticipated experiences.  IMO, this premise (that beliefs should try to be predictions, and should try to be accurate predictions) is one of the cornerstones that LessWrong has been based on.

Hello, I came across this forum while reading an AI research paper where the authors quoted from Yudkowsky's "Hidden Complexity of Wishes." The linked source brought me here, and I've been reading some really exceptional articles ever since. 

By way of introduction, I'm working on the third edition of my book "Inside Cyber Warfare" and I've spent the last few months buried in AI research specifically in the areas of safety and security.  I view AGI as a serious threat to our future for two reasons. One, neither safety nor security have ever been p... (read more)

2trevor
Hi Jeffrey! Glad to see more cybersecurity people taking the issue seriously.  Just so you know, if you want to introduce someone to AGI risk, the best way I know of to introduce laymen to the problem is for them to read Scott Alexander's Superintelligence FAQ. This will come in handy down the line.
2habryka
Welcome! Hope you have a good time!

What do you estimate the time lag to be if AI startup "Ishtar" utilized your proposed safety method while every other AI startup that Ishtar was competing with in the medical sector, didn't? 

It also seems to me like medical terminology would be extremely hard to translate into an ancient language where those words weren't known. The fine-tuning to correct for those errors would have to be calculated into that time lag as well. 

Would a bad actor be able to craft Persuasive Adversarial Prompts (PAP) in Sumerian and have the same impact as they did ... (read more)