Would you consider computer viruses as limited agents trying to appear as identical (superficially) as the unaltered system as possible?
Also note that the actual change between the original system and the altered system can be arbitrarily small though the change in behavior can be extremely large. Consider for example the Ken Thompson hack or the recent single gate security attack.
Not looking for exactly this, but somewhat related.
In order to better understand how AI might succeed and fail at learning knowledge, I'll be trying to construct models of limited agents (with bias, knowledge, and preferences) that display identical behaviour in a wide range of circumstance (but not all). This means their preferences cannot be deduced merely/easily from observations.
Does anyone have any suggestions for possible agent models to use in this project?