Benj Edwards at Ars Technica analyzes Anthropic’s announcement of the latest version of its Claude AI, which for the first time beats GPT-4 on benchmarks and demonstrates “near-human” capabilities in some areas (or so Anthropic says).

Benchmarks don’t necessarily show how effective the tool is, Edwards notes.

Also:

It’s probably true that Opus is “near-human” on some specific benchmarks, but that doesn’t mean that Opus is a general intelligence like a human (consider that pocket calculators are superhuman at math).