faul_sname

Wiki Contributions

Comments

Sorted by

I note that if software developers used that logic for thinking about software security, I expect that almost all software in the security-by-obscurity world would have many holes that would be considered actual negligence in the world we live in.

This suggests that the threat model isn't so much "very intelligent AI" as it is "very cheap and at least somewhat capable robots".

“Based on your understanding of AI technical developments as of March 29, 2023, evaluate the most important known object-level predictions of Eliezer Yudkowsky on the subject, and which ones seemed true versus false. Afterwards, evaluate those predictions as a group, on a scale from ‘mostly true’ to ‘mostly false.’“

I ran this prompt but substituted in "Gary Marcus" for "Eliezer Yudkowsky". Claude says

Overall evaluation: On a scale from 'mostly true' to 'mostly false,' I would rate Gary Marcus's predictions as a group as "Mostly True."

Many of Marcus's predictions about the limitations of current AI systems and the challenges ahead have proven accurate. His concerns about reasoning, abstract thinking, and the need for more sophisticated knowledge representation align with ongoing challenges in the field. His emphasis on AI safety and alignment has also been prescient.

However, it's worth noting that some of his predictions might be seen as overly pessimistic by some in the AI community. The rapid progress in LLMs and their applications has surprised many, including some skeptics. Nonetheless, many of the fundamental challenges he pointed out remain relevant.

It's also important to remember that the field of AI is rapidly evolving, and assessments of such predictions can change quickly as new breakthroughs occur. As of my last update in April 2024, many of Marcus's key points still held true, but the field continues to advance at a rapid pace.

I think Claude likes saying nice things about people, so it's worth trying to control for that.

Another issue is that a lot of o1’s thoughts consist of vagaries like “reviewing the details” or “considering the implementation”, and it’s not clear how to even determine if these steps are inferentially valid.

If you're referring to the chain of thought summaries you see when you select the o1-preview model in chatgpt, those are not the full chain of thought. Examples of the actual chain-of-thought can be found on the learning to reason with LLMs page with a few more examples in the o1 system card. Note that we are going off of OpenAI's word that these chain of thought examples are representative - if you try to figure out what actual reasoning o1 used to come to a conclusion you will run into the good old "Your request was flagged as potentially violating our usage policy. Please try again with a different prompt."

Shutting Down all Competing AI Projects is not Actually a Pivotal Act

This seems like an excellent title to me.

Technically this probably isn't recursive self improvement, but rather automated AI progress. This is relevant mostly because

  1. It implies that, at least through the early parts of the takeoff, there will be a lot of individual AI agents doing locally-useful compute-efficiency and improvement-on-relevant-benchmarks things, rather than one single coherent agent following a global plan for configuring the matter in the universe in a way that maximizes some particular internally-represented utility function.
  2. It means that multi-agent dynamics will be very relevant in how things happen

If your threat model is "no group of humans manages to gain control of the future before human irrelevance", none of this probably matters.

My argument is more that the ASI will be “fooled” by default, really. It might not even need to be a particularly good simulation, because the ASI will probably not even look at it before pre-commiting not to update down on the prior of it being a simulation.

Do you expect that the first takeover-capable ASI / the first sufficiently-internally-cooperative-to-be-takeover-capable group of AGIs will follow this style of reasoning pattern? And particularly the first ASI / group of AGIs that actually make the attempt.

Yeah, my argument was "this particular method of causing actual human extinction would not work" not "causing human extinction is not possible", with a side of "agents learn to ignore adversarial input channels and this dynamic is frequently important".

It does strike me that, to OP's point, "would this act be pivotal" is a question whose answer may not be knowable in advance. See also previous discussion on pivotal act intentions vs pivotal acts (for the audience, I know you've already seen it and in fact responded to it).

If an information channel is only used to transmit information that is of negative expected value to the receiver, the selection pressure incentivizes the receiver to ignore that information channel.

That is to say, an AI which makes the most convincing-sounding argument for not reproducing to everyone will select for those people who ignore convincing-sounding arguments when choosing whether to engage in behaviors that lead to reproduction.

Load More