All of A_Posthuman's Comments + Replies

Answer by A_Posthuman30

Yes, there appears to already be work in this area. Here is a recent example I ran across on twitter showing videos of 2 relatively low cost robot arms learning various very fine manipulation tasks apparently after just 15 minutes or so of demonstrations:

 

Introducing ACT: Action Chunking with Transformers

https://twitter.com/tonyzzhao/status/1640395685597159425

 

related website:

Learning Fine-Grained Bimanual Manipulation
with Low-Cost Hardware

https://tonyzhaozh.github.io/aloha/

"Fundamentally incapable" is perhaps putting things too strongly, when you can see from the Reflexion paper and other recent work in the past 2 weeks that humans are figuring out how to work around this issue via things like reflection/iterative prompting:

 

https://nanothoughts.substack.com/p/reflecting-on-reflexion

https://arxiv.org/abs/2303.11366

 

Using this simple approach lets GPT-4 jump from 67% to 88% correct on the HumanEval benchmark.

 

So I believe the lesson is: "limitations" in LLMs may turn out to be fairly easily enhanced away by cle... (read more)

I think this is a good idea, and as someone who has recorded themselves 16 hrs day for 10+ years now I can say recording yourself gets to be very routine and easy.

Is it just me, or does this validate some of the parts of Yann LeCun's "A Path Towards Autonomous Machine Intelligence" paper?

 

The two papers both use an algorithm consisting of multiple specialized models, with DreamerV3 using 3 models that seem very similar to those described by LeCun:

 

"the world model predicts future outcomes of potential actions"

"the critic judges the value of each situation"

"the actor learns to reach valuable situations"

 

World model, critic, actor - all are also described in LeCun's paper. So are we seeing a successful push towards an AGI similar to LeCun's ideas?

3Multicore
I'm not familiar with LeCun's ideas, but I don't think the idea of having an actor, critic, and world model is new in this paper. For a while, most RL algorithms have used an actor-critic architecture, including OpenAI's old favorite PPO. Model-based RL has been around for years as well, so probably plenty of projects have used an actor, critic, and world model. Even though the core idea isn't novel, this paper getting good results might indicate that model-based RL is making more progress than expected, so if LeCun predicted that the future would look more like model-based RL, maybe he gets points for that.