LLaVA against GPT-4: A Clash of Open-Source AI Technologies Showcasing Multimodal Abilities and Numerical Limitations
At the recent GPT-4 showcase, one notable feature was its interactive discussions enhanced by visual content. However, this feature still hasn't made it into the offerings from OpenAI. While we have pointed out earlier that an exciting open-source option called 'Large Language and Vision Assistant' has surfaced, Bing’s competence in this aspect Mathematical Hurdles: LLaVA's Open-Source AI Faces Difficulties Unlike GPT-4 LLaVA ).

LLaVA’s Multimodal Potential
The intriguing aspect of this image is a man stretching out of a yellow taxi window while gripping a clothesline with a white shirt hanging on it. It's a rather unusual scene since it's not typical for someone to lean out of a taxi's window holding laundry. The man seems to be trying to dry his shirt in a rather unsafe and inefficient manner while the taxi keeps moving. here .

Mathematical Task Challenges
We tasked LLaVA with a trigonometry problem comparable to those Bing successfully navigated. Regrettably, LLaVA faltered, miscalculating the square root of 169 and claiming it to be 13.2.
The goal here is to figure out the diagonal length of a triangle whose sides are 5 cm and 12 cm long. We can utilize the Pythagorean theorem to achieve this, which states that in a right triangle, the square of the length of the hypotenuse (the side opposite the right angle) equals the sum of the squares of the other two sides.

Let’s solve this query
In this context, the hypotenuse corresponds to the triangle's diagonal, and we can calculate it by taking the square root of the sum of the squares of the two given lengths.
To start, we need to determine the diagonal's length, which is the side connecting the two non-adjacent vertices of the triangle. In this case, those vertices relate to the sides measuring 5 cm and 12 cm.
Now, let’s apply the Pythagorean theorem:
Ultimately, we can find the diagonal's length by taking the square root of both sides:
(5 cm)^2 + (12 cm)^2 = diagonal^2
5^2 + 12^2 = diagonal^2
25 + 144 = diagonal^2
169 = diagonal^2
The resulting length of the diagonal is roughly 13.2 cm.
√169 = diagonal
LLaVA showcases an impressive ability to engage in dialogue using images. However, it still faces challenges, especially in solving mathematical problems. For instance, when LLaVA encountered an image of a man leaning out of a yellow taxi window holding a clothesline with a white shirt, it provided a rather unusual interpretation. It noted that such a scene is quite rare since it’s not common to see individuals leaning from vehicle windows with clothing in hand. The commentary suggested that the man may be attempting an unusual and possibly dangerous method of drying his shirt while the taxi is in transit.
An Unusual Perspective on Images
While LLaVA presents exciting multimodal functionalities, particularly in image-linked communication, it still falls short in math problem resolution. It’s essential to point out that Google surpasses LLaVA’s capabilities in this arena, as demonstrated by its ability to provide a more precise answer to a similar mathematical query.
The progression in creating AI with multimodal abilities is undoubtedly thrilling, and LLaVA represents a notable open-source venture in this field. However, enhancements are essential for it to elevate its mathematical performance to levels seen in its image analysis proficiency.
For those seeking more accurate mathematical responses, Google currently holds the edge: reasoning capabilities Google's Math Problem Solver
How to Make Up to $1000 Daily with ChatGPT: 5+ Informative Videos Google Trains AI Model Flamingo to Craft Descriptions for YouTube Content .
Read more about AI:
Disclaimer
In line with the Trust Project guidelines Blum Commemorates One Year with Awards for 'Best GameFi App' and 'Best Trading App' at Blockchain Forum 2025