Are large language models dyslexic?

 These models are remarkable. They can match or exceed human performance on countless tasks, for example, diagnosing cancers from visual slides better than any human. And yet a recent study found a surprising result: All major MLLMs currently struggle to tell time on analog clocks. According to the study, GPT-4o was only able to correctly read clock faces 8% of the time. Claude-3-5-sonnet was worse at 6%. Gemini 2.0 was the best, but still at only 20%.

Read the full story HERE