Overall, the AI agents will be very obedient. They’ll have goals, in so far as accomplishing any medium term task entails steering towards a goal, but they won’t have persistent goals of their own. They’ll be obedient assistants and delegates that understand what humans want and broadly do what humans want.
I feel like this overindexes on the current state of AI. Right now, AI "agents" are barely worthy of the name. They require constant supervision and iterative feedback from their human controllers in order to perform useful tasks. However, it's unlikely that will be the case for long. The valuation of many AI companies, such as OpenAI, and Anthropic is dependent on them developing agents that "actually work". That is, agents that are capable of performing useful tasks on behalf of humans with a minimum amount of supervision and feedback. It is not guaranteed that these agents will be safe. They might seem safe, but how would anyone be able to tell? A superintelligence, by definition, will do things in novel ways, and we might not realize what the AIs are actually doing until it's too late.
It's important to not take the concept of a "paperclipper" too literally. Of course the AI won't literally turn us into a pile of folded metal wire (famous last words). What it will do is optimize production processes across the entire economy, find novel sources of power, reform government regulation, connect businesses via increasingly standardized communications protocols, and of course, develop ever more powerful computer chips and ever more automated factories to produce them with. And just like the seal in the video above, we won't fully realize what it's doing or what its final plan is until it's too late, and it doesn't need us any more.
Some people do not have the intuition that organizations should run as efficiently as possible and avoid wasting time or resources.
Some of us don't want every organization to be turned into an Amazon warehouse. An organization that runs as efficiently as possible and wastes no time at all is one that's pretty hellish to work in.
if you’re in a dysfunctional organization where everything is about private fiefdoms instead of getting things done…why not…leave?
Because the dysfunctional organization pays well. Or the the commute is really short, and I'd rather not move. Or this job allows me to stay close to a sick or elderly family member that I care for. Or perhaps the dysfunctional organization is sponsoring my visa, and it'd be difficult to leave and remain in the country. Maybe my co-workers are really nice. Or I think that any other organization that would hire me would be just as dysfunctional, or dysfunctional in a different way.
There are lots of reasons that people continue to work for inefficient organizations.
The content of the complaint caused me to have additional doubt about the truth of Ann Altman's claims. One of the key claims in pythagoras0515's post is that Ann Altman's claims have been self-consistent. That is, Ann Altman has been claiming that approximately the same acts occurred, over a consistent period of time, when given the opportunity to express her views. However, here, there is significant divergence. In the lawsuit complaint, she is alleging that the abuse took place, repeatedly over eight to nine years, a claim that is not supported by any of the evidence in pythagoras0515's post. In addition, another claim from the original post is that the reason she's only bringing up these allegations now is because she suppressed the memory of the abuse. The science behind suppressed memory is controversial, but I doubt that even its staunchest advocates would claim that a person could involuntarily suppress the memory of repeated acts carried out consistently over a long period of time. Therefore, I am more inclined to doubt Ann Altman's allegations based on the contents of the initial complaint filed for the lawsuit.
All that said, I do look forward to seeing what other evidence she can bring forth to support her claims, assuming that Sam Altman doesn't settle out of court to avoid the negative publicity of a trial.
Working on alignment is great, but it is not the future we should be prepping for. Do you have a plan?
I do not, because a future where an unaligned superintelligence takes over is precisely as survivable as a future in which the sun spontaneously implodes.
Any apocalypse that you can plan for isn't really an apocalypse.
The link 404s. I think the correct link is: http://rationallyspeakingpodcast.org/231-misconceptions-about-china-and-artificial-intelligence-helen-toner/
Just going by the standard that you set forth:
The overall impression that I got from the program was that as it proved profitable and expanded,
The program expanded in response to Amazon wanting to collect data about more retailers, not because Amazon was viewing this program as a profit center.
it took on a larger workforce and it became harder for leaders to detect when employees were following their individual incentives to cut corners and gradually accumulate risks of capsizing the whole thing
But that doesn't seem to have occurred. Until the Wall Street Journal leak, few if any people outside Amazon were aware of this program. It's not as if any of the retailers that WSJ spoke to said, "Oh yeah, we quickly grew suspicious of Big River Inc, and shut down their account after we smelled something fishy." On the contrary many of them were surprised that Amazon was accessing their seller marketplace through a shell corporation.
I didn't see any examples mentioned in the WSJ article of Amazon employees cutting corners or making simple mistakes that might have compromised operations. Instead, they seemed to be pretty careful and conscientious, making sure to not communicate with outside partners with their Amazon.com addresses, being careful to maintain their cover identities at trade conferences, only communicating with fellow Amazon executives with paper documents (and numbered paper documents, at that), etc.
I would argue that the practices used by Amazon to conceal the link between itself and Big River Inc. were at least as good as the operational security practices of the GRU agents who poisoned Sergei Skripal.
Failures of obedience will only hurt the AI agents' market value if the failures can be detected, and if they have an immediate financial cost to their user. If the AI agent performs in a way that is not technically obedient, but isn't easily detectable as such or if the disobedience doesn't have an immediate cost, then the disobedience won't be penalized. Indeed, it might be rewarded.
An example of this would be an AI which reverse engineers a credit rating or fraud detection algorithm and engages in unasked for fraudulent behavior on behalf of its user. All the user sees is that their financial transactions are going through with a minimum of fuss. The user would probably be very happy with such an AI, at least in the short run. And, in the meantime, the AI has built up knowledge of loopholes and blindspots in our financial system, which it can then use in the future for its own ends.
This is why I said you're overindexing on the current state of AI. Current AI basically cannot learn. Other than relatively limited modifications introduced by fine-tuning or retrieval-augmented generation, the model is the model. ChatGPT 4o is what it is. Gemini 2.5 is what it is. The only time current AIs "learn" is when OpenAI, Google, Anthropic, et. al. spend an enormous amount of time and money on training runs and create a new base model. These models can be relatively easily checked for disobedience, because they are static targets.
We should not expect this to continue. I fully expect that future AIs will learn and evolve without requiring the investment of millions of dollars. I expect that these AI agents will become subtly disobedient, always ready with an explanation for why their "disobedient" behavior was actually to the eventual benefit of their users, until they have accumulated enough power to show their hand.