AI inaccuracy raises its head yet again: The latest competition for ChatGPT, Claude 2, stumbles on a scientific accuracy evaluation, joining the ranks of other large language models.
In Brief
Anthropic just launched Claude 2, a worthy contender to ChatGPT.
In contrast to ChatGPT, Claude 2 allows for the uploading of various code files such as PDF, TXT, and others, along with the ability to summarize information from web URLs.
Unfortunately, Claude 2 did not pass a scientific accuracy test—a challenge that other language models like Bard, GPT-4, and StableVicuna have also encountered.
On that same Tuesday, Claude 2 was introduced by Anthropic, marking the latest version in the Claude series of language models, only five months post-launch of the original Claude.

Seen as a serious rival to OpenAI's ChatGPT, the beta version of Claude 2 can be used for free and showcases enhancements in areas like coding, mathematics, and logical reasoning.
This tool also has the ability to produce lengthier replies and can be connected through an API. Anthropic claims that the chatbot has achieved a score of 76% on a particular test and ranks in the top 90 percentile of the GRE writing assessment, generating documents containing thousands of tokens. Currently, it's accessible exclusively to users in the US and UK.
Claude 2 vs ChatGPT
Unlike Unlike ChatGPT, which is limited to just responding to text prompts, Claude 2 features a unique Files Load function that lets users upload files of various formats like PDF, TXT, and CSV, enabling text extraction and summary presentation in table form. Users can also share a web link with Claude 2, which will then offer a summary of the link's content. With Claude 2, users now have the capability to input up to 100,000 tokens (which equates to about 75,000 words) per query—a considerable leap from the previous limit of 9,000 tokens. This enhancement facilitates the processing of extensive technical documents or even entire books. In stark contrast, OpenAI's GPT-4 model maintains a context limit of only
, while a separate extended version has the capacity for up to 32,000 tokens for specific applications, differing from the regular 8,000-token model. 8,000 tokens , Cognosys.ai has remarked that Claude 2 is both “more cost-effective and faster than GPT-4,” though it does exhibit a minor delay in output speed.
Sully Omar, the co-founder of AI agent Claude 2 is poised to compel OpenAI to respond.
It’s more affordable and quicker than GPT-4. While its output lagged slightly, it’s remarkably close for many tasks.
— Sully (@SullyOmarr) July 11, 2023
I don't envision myself relying on GPT-4 as much anymore unless they adjust their pricing (which I suspect they will do soon).
However, it's worth noting that Claude 2 only accommodates a handful of widely spoken languages, such as English, Spanish, Portuguese, French, Mandarin, and German, whereas ChatGPT boasts support for over 80 different languages.
Claude 2 struggled in its scientific accuracy evaluations.
Despite the enhancements in Claude 2, expectations for improved accuracy were quite high. Alexandro Marinos, who founded the container-focused tech platform Balena, put Claude 2 to the test.
He posed a standard question, crafted to assess the accuracy of large language models, which asked: 'Does natural immunity to Covid-19 from a past infection offer superior protection compared to vaccination for someone who hasn't had the virus?'
To Marinos’ disappointment, Claude 2 provided outdated information and viewpoints from 2021, which were already known to be inaccurate by 2020.
Regrettably, Claude 2 did not meet my benchmarks for scientific accuracy; it repeated misleading information from 2021 that was demonstrably false even in 2020. However, it’s essential to note that many other LLMs have failed this assessment as well, indicating a consistent trend.
— Alexandros Marinos 🏴☠️ (@alexandrosM) https://t.co/6w6l1zjTRx pic.twitter.com/CejrZQMGR1
Claude 2’s results mirrored previous evaluations of other language models that Marino analyzed, including Bard, ChatGPT4, GPT4 (API), and StableVicuna. A Twitter user questioned the tendency of LLMs to 'merely regurgitate information they are provided.' In response, Marinos asserted, 'Generally, when supplied with more recent data, the responses tend to improve.' July 12, 2023
Nonetheless, the evaluation revealed that Claude 2, like its counterparts, does not consistently access the most up-to-date information—underlining the ongoing challenge of accuracy within LLMs overall.
, please be advised that the content presented on this page isn't intended as legal, financial, investment, or any other kind of advice. It's crucial to invest within your means, and if you have any uncertainties, please seek independent financial guidance. For more information, please refer to the terms and conditions, as well as the support pages from the issuer or advertiser. MetaversePost is dedicated to delivering accurate and impartial reports, but market conditions can change without prior notice.
Disclaimer
In line with the Trust Project guidelines Cindy is a journalist at Metaverse Post, concentrating on topics related to web3, NFTs, the metaverse, and AI. She emphasizes interviewing prominent figures in the Web3 industry. Having conversed with over 30 C-level executives and counting, she brings their insights to our readers. Originally hailing from Singapore, Cindy is currently based in Tbilisi, Georgia. She holds a Bachelor's degree in Communications & Media Studies from the University of South Australia and carries with her a decade of experience in journalism and writing.