User Comment Replies

Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study

Great work here, but I do feel that the only important observations in practice are those about reasoning. To the extent that obtaining visual information is the problem, I think the design of language models currently is just not representative of how this task would be implemented in real robotics applications for at least two reasons:

The model is not using anywhere near all of the information about an image that it could be, as language models which accept image data are just accepting an embedding that is far smaller (in an information theoretic sense)

... (read more)

3Adam Karvonen12d

I don't think image understanding is the bottleneck. O3 and O4-mini-high seem like they are a meaningful improvement in vision, where it's almost good enough for this part, but they still fail miserably at the physical reasoning aspects. This person got O4-mini-high to generate a reasonably close image depiction of the part. https://x.com/tombielecki/status/1912913806541693253

OpenAI o1

ashesfall7mo10

This is also how I interpreted it.

There should be more AI safety orgs

ashesfall2y50

It would be great if there were more options. I would absolutely leave my current job, and bring my ML experience with me, to a role in AI safety. I would be okay to take a pay cut to do it. This doesn’t seem like an option to me though, after a brief bit of searching on and off over the last year.

5robm2y

I have similar feelings, there's not a clear path for someone in an adjacent field. I chose my current role largely based on the expected QALYs, and I'd gladly move into AI Safety now for the same reason. This post gives the impression that finding talent is not the current constraint, but I'm confused about why the listed salaries are so high for some of these roles if the pool is so large. I've submitted applications to a few of these orgs, with cover letters that basically say "I'm here and willing if you need my skills". One frustration is recognizing Alignment as our greatest challenge, and not having a path to go work on it. Another is that the current labs look somewhat homogeneous and a lot like academia, which is not how I'd optimize for speed.

Lies, Damn Lies, and Fabricated Options

ashesfall2y30

Great essay. Though I would note that “price gouging” usually refers to the scenario described: when the change in price is possible only by virtue of seller market power. I think the term is misused enough that it makes sense to present the example as is, but I would call it a pretty basic error in terminology for the scenario, absent seller market power, being referred to as price gouging.

LESSWRONG
LW

All of ashesfall's Comments + Replies