Andrew Deece — LessWrong

Agents trained via general DRL algorithms (SP, PBT) in collaborative environments are very good at coordinating with themselves. They’re not able to handle human partners well, since they have never seen humans during training.

Test Env., “Overcooked” is a video game, players control chefs in a kitchen to cook & serve dishes, each dish takes several high-level actions, the challenge is motion coordination.

Agents should learn how to (1) navigate the map, (2) interact with objects, (3) drop the objects off in the right locations, (4) serve completed dishes to the serving area. (5) spatial awareness & coordination with their partner.

Takeaways, Agents that were explicitly designed to work well with a human model achieved significantly better performance. PPO BC agent learns to use both pots & can take both leader & follower roles. The challenge is the BC model is not adaptive, as it cannot learn to take advantage of human adaptivity. Real humans learn throughout the episode to anticipate stuff happening.

Really enjoyed the paper, agents learning to collaborate rather than compete with humans is positive.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments