These models are remarkable. They can match or exceed human performance on countless tasks, for example, diagnosing cancers from visual slides better than any human. And yet a recent study found a surprising result: All major MLLMs currently struggle to tell time on analog clocks. According to the study, GPT-4o was only able to correctly read clock faces 8% of the time. Claude-3-5-sonnet was worse at 6%. Gemini 2.0 was the best, but still at only 20%.
Read the full story HERE