2PuNCheeZ

Message

Do multimodal LLMs (like 4o) use OCR under the hood to read dense text in images?

SOTA multimodal LLMs can read text from images (e.g. signs, screenshots, book pages) really well. Are they actually using an internal OCR system, or do they learn to "read" purely through pretraining (like contrastive learning on image-text pairs)?

Jun 15, 20254

François Chollet on the limitations of LLMs in reasoning

François Chollet, the creator of the Keras deep learning library, recently shared his thoughts on the limitations of LLMs in reasoning. I find his argument quite convincing and am interested to hear if anyone has a different take. > The question of whether LLMs can reason is, in many ways,...

Jul 30, 20241

LESSWRONG
LW

LESSWRONG
LW

2PuNCheeZ

2PuNCheeZ

2PuNCheeZ

2PuNCheeZ

Do multimodal LLMs (like 4o) use OCR under the hood to read dense text in images?

François Chollet on the limitations of LLMs in reasoning

Do multimodal LLMs (like 4o) use OCR under the hood to read dense text in images?

François Chollet on the limitations of LLMs in reasoning