Skip to main content
BlackSquareFoundation

Forgotten Polygons: Multimodal Large Language Models are Shape-Blind - err, oh, no they're not

Summary #

In short, the "Forgotten Polygons: Multimodal Large Language Models are Shape-Blind" paper that's been just published has been blown out of the water by OpenAI's o3 within a few days of it being published.

In Depth #

The most problematic example, Arrow-Plus, scoring 0% in the paper was solved perfectly by o3.

o3's output:

Thought for 2m 30s
The outline consists of 15 straight‑line segments, so the shape is a 15‑sided polygon (a pentadecagon).

Further Reading #

  1. Forgotten Polygons: Multimodal Large Language Models are Shape-Blind