Anthropic turns up the dial with AI model Claude 3

Voltaire Staff
Mar 5, 2024
2 min read

Anthropic has turned up the heat in AI field with the launch of Claude 3, a family of three AI language models, claiming its latest in line outperforms ChatGPT-4 on several metrics.

The company, which launched the model on Monday, claims that they have established new industry standards across various cognitive functions, with some capabilities even approaching a "near-human" level.

The Claude line-up features Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus -- each escalating in prowess and a step-up in performance and capability.

Anthropic PBC is an American artificial intelligence start-up company, founded by former members of OpenAI.

"Claude is a family of foundational AI models that can be used in a variety of applications. You can talk directly with Claude at claude.ai to brainstorm ideas, analyze images, and process long documents. For developers and businesses, you can now get API access and build directly on top of our AI infrastructure," Anthropic wrote on its website.

With Claude 3, Anthropic may have caught up with OpenAI in performance, though expert consensus is pending, given the selective nature of AI benchmark presentations.

A user on X said, "I been in this business for 30 years… it’s never moved this fast."

Another said, "I just looked at the API pricing and it is clear that this will cost Sam MILLIONS OF DOLLARS in the coming weeks and months (if not more) The era of GPT-4 might just be over."

Claude 3 shows impressive performance in tasks like reasoning, knowledge, math, and language fluency. While there's debate over whether such models truly "know" or "reason," these terms are commonly used in AI research.

Anthropic asserts that the Opus model, the most advanced of the three, approaches "near-human" comprehension and fluency in complex tasks.

According to Anthropic, Claude 3 Opus outperforms GPT-4 on 10 AI tests, including ones about basic knowledge, math, coding, and common knowledge. Some wins are close, like Opus scoring 86.8 per cent versus GPT-4's 86.4 per cent on a test about basic knowledge. However, the exact implications for customers remain uncertain.

Simon Willison, an AI researcher, told Ars Technica, "As always, LLM benchmarks should be treated with a little bit of suspicion."

He added, "How well a model performs on benchmarks doesn't tell you much about how the model 'feels' to use. But this is still a huge deal—no other model has beaten GPT-4 on a range of widely used benchmarks like this."

Claude 3 models show improvements over Claude 2 in areas such as analysis, forecasting, content creation, code generation, and multilingual conversation and more.

On pricing, Willison said, "The unreleased cheapest one looks radically competitive, the best quality one is super expensive."

Anthropic said it will roll out regular updates for the Claude 3 models in the next few months, introducing new features like tool use, interactive coding, and enhanced abilities.

The company also said it will have a team that will track and ward off instances of "misinformation and CSAM to biological misuse, election interference, and autonomous replication skills."

"We continue to develop methods such as Constitutional AI that improve the safety and transparency of our models, and have tuned our models to mitigate against privacy issues that could be raised by new modalities," it said.