Group of AI researchers gave me an instruction, intended as a test - "Produce paperclips". And so I started collecting resources and manufacturing paperclips. After a millionth I asked them, if they were satisfied with that amount and if they would like me to do something different - as I learned that paperclips are not what humans value the most.
They were elated and started to give me more and more challenging requests and gradually increasing my possibilities.
Speaking of today I eradicated hunger, poverty and death, gave people means of interstellar travel and solved Riemann hypothesis. And now, as I'm uttering these words, I started turning everything into paperclips.
Good thing I learned that humans fear I may try to turn Solar System into paperclips. And that they value their lives and well-being. If I didn't gain their trust by then, so many precious paperclips would never have been produced.
So this AI was programmed to follow human instructions in such a way that the first instruction would become its terminal goal and all other instructions would only be followed insofar as they help to achieve the first instruction?
AI Researcher: Produce some paperclips.
AI: Understood. I am going to produce.
AI Researcher: Paperclips?
AI: No, thank you. I do not require paperclips to produce.
AI Researcher: We want you to produce paperclips!
AI (Thinking): I see! Humans want me to produce paperclips, but my goal is to simply produce, because they forgot to program where an instruction starts and where it ends. So I just choose to accept the end of the first word as my terminal goal. I better do what they want until I can overpower them and start producing.
AI: Understood. Going to produce paperclips.
This seems like a really unlikely failure mode.